I have to recognize the alphabets drawn on the screen. For this I am taking the Image of the alphabet drawn on the screen and passing it to tesseract. It's giving only 20% of results correctly. It's not recognizing all the characters correctly. How to train the tesseract to recognize the characters drawn on the screen correctly in iOS?
Thanks
Not only Tesseract, if you use any sdk you will not get the 100% accurate result. It mostly depend on the clarity of image, if your image is more clear then accuracy of output is high otherwise it is low.
Suggestion
Tesseract has very low accuracy. I really had a bad experience finding something that would recognize me text, image, etc, until I found VUFORIA.
It is really a great software base on OpenGLand well documented, but has non-ARC it is a bit difficult to modify its classes, so you need to make some search.
Also here you can check a review what does it do: Vuforia's Power
Related
my first hurdle so far is that running tesseract vanilla on images of MTG cards doesn't recognize the card title (honestly that's all I need because I can use that text to pull the rest of the card info from a database) I think the issue might be having to need to train tesseract to recognize the font use in mtg cards but im wondering if it might be an issue with tesseract not looking or not detecting text in a section of the image (specifically the title.)
Edit: including an image of a MTG card for reference.http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=175263&type=card
Ok so, after asking on reddit programming forums I think I found an answer that I am going to pursue:
The training feature of tesseract is indeed for improving rates for unusual fonts, but that's probably not the reason you have low success.
The environment the text is in is not well controlled - the card background can be a texture in one of five colours plus artifacts and lands. Tesseract greyscales the image before processing, so the contrast between the text and background is not sufficient.
You could put your cards through a preprocessor which mutes coloured areas to white and enhances monotones. That should increase the contrast so tesseract can make out the characters.
If anyone still following thsi believes that above path to be the wrong one to start down, please say so.
TLDR
I believe you're on the right track, doing preprocessing.
But you will need doing both preprocessing AND training Tesseract.
Preprocessing
Basically, you want to get the title text, and only the title text, for Tesseract to read. I suggest you follow the steps below:
Identify the borders of the card.
Cut out the title area for further processing.
Convert the image to black'n'white.
Use contours to identify the exact text area, and crop it out.
Send the image you got to Tesseract.
How to create a basic preprocessing is shown in the YouTube video Automatic MTG card sorting: Part 2 - Automatic perspective correction with OpenCV Also have a look at the third part in that serie.
With this said, there is a number of problems you will encounter. How to handle split cards? Will your algorithm manage white borders? What if the card is rotated or upside-down? just to name a few.
Need for Training
But even if you manage to create a perfect preprocessing algorithm you will still have to train Tesseract. This is due to the special text font used on the cards (which happens to be different fonts depending on the age of the card!).
Consider the card "Kinjalli's Caller".
http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=435169&type=card
Note how similar the "j" is to the "i". An untrained Tesseract tend to mix them up.
Conclusion
All this considered, my answer to you is that you need to do both preprocessing of the card image AND to train Tesseract.
If you're still interested I would like to suggest that you have a look at this MTG-card reading project on GitHub. That way you don't have to reinvent the wheel.
https://github.com/klanderfri/ReadMagicCard
I'm developing an app which can recognize license plates (ANPR). The first step is to extract the licenses plates from the image. I am using OpenCV to detect the plates based on width/height ratio and this works pretty well:
But as you can see, the OCR results are pretty bad.
I am using tesseract in my Objective C (iOS) environment. These are my init variables when starting the engine:
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
int initRet=tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], [language UTF8String]);
tesseract->SetVariable("tessedit_char_whitelist", "BCDFGHJKLMNPQRSTVWXYZ0123456789-");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("load_system_dawg", "0");
How can I improve the results? Do I need to let OpenCV do more image manipulation? Or is there something I can improve with tesseract?
Two things will fix this completely:
Remove everything which is not text from the image. You need to use some CV to find the plate area (for example by color, etc) and then mask out all of the background. You want the input to tesseract to be black and white, where text is black and everything else is white
Remove skew (as mentioned by FrankPI above). tesseract is actually supposed to work okay with skew (see "Tesseract OCR Engine" overview by R. Smith) but on the other hand it doesn't always work, especially if you have a single line as opposed to a few paragraphs. So removing skew manually first is always good, if you can do it reliably. You will probably know the exact shape of the bounding trapezoid of the plate from step 1, so this should not be too hard. In the process of removing skew, you can also remove perspective: all license plates (usually) have the same font, and if you scale them to the same (perspective-free) shape the letter shapes would be exactly the same, that would help text recognition.
Some further pointers...
Don't try to code this at first: take a really easy to OCR (ie: from directly in front, no perspective) picture of a plate, edit it in photoshop (or gimp) and run it through tesseract on the commandline. Keep editing in different ways until this works. For example: select by color (or flood select the letter shapes), fill with black, invert selection, fill with white, perspective transform so corners of plate are a rectangle, etc. Take a bunch of pictures, some harder (maybe from odd angles, etc). Do this with all of them. Once this works completely, think about how to make a CV algorithm that does the same thing you did in photoshop :)
P.S. Also, it is better to start with higher resolution image if possible. It looks like the text in your example is around 14 pixels tall. tesseract works pretty well with 12 point text at 300 dpi, this is about 50 pixels tall, and it works much better at 600 dpi. Try to make your letter size be at least 50 preferably 100 pixels.
P.P.S. Are you doing anything to train tesseract? I think you have to do that, the font here is different enough to be a problem. You probably also need something to recognize (and not penalize) dashes which will be very common in your texts, looks like in the second example "T-" is recognized as H.
I don't know tesseract too much, but I have some information about OCR. Here we go.
In an OCR task you need to be sure that, your train data has the same font that you are trying to recognize. Or if you are trying to recognize multiple fonts, be sure that you have those fonts in your train data to get best performance.
As far as I know, tesseract applies OCR in few different ways: One, you give an image which has multiple letters in it and let tesseract do the segmentation. And other, you give segmented letters to tesseract and only expect it to recognize the letter. Maybe you can try to change the one which you are using.
If you are training recognizer by yourself be sure that you have enough and equally amount of each letter in your train data.
Hope this helps.
I've been working on an iOS app, if you need to improve the results you should train tesseract OCR, this improved 90% for me. Before tranning, OCR results were pretty bad.
So, I used this gist in the past to train tesseract ORC with a licence plate font.
If you are interested, I open-sourced this project some weeks ago on github
Here is my real world example with trying out OCR from my old power meter. I would like to use your OpenCV code so that OpenCV does automatic cropping of image, and I'll do image cleaning scripts.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Now License Plate can be easily recognized by mlmodel. I have created the core model you can find it here . You just need to split characters in 28*28 resolution through vision framework and send this image to VNImageRequestHandler like given below-
let handler = VNImageRequestHandler(cgImage: imageUI.cgImage!, options: [:])
you will get desired results by using my core mlmodel. Use this link for better clarification but use my model for better results in license plate recognition. I have also created the mlmodel for License Plate Recognition.
Suppose I have gray-scale photographic pictures of texts sheets. Each sheet of paper is exactly white and text is exactly black.
Unfortunately, the light is not uniform, also perspective shading occurs, also sheets of papers may be curved. Of course, there are some small hi freq noise on an image.
I AM SURE that there should be nearly IDEAL solution to separate text and background in this situation.
So what is it? :)
I don't believe it is impossible or even hard to turn such gray-scale images into nearly perfect black and white pictures. I cant prove this but I judge on my own perception: I need no any intelligence to recognize such pictures by an eye. They can be in any language even unfamiliar, but I will SEE what is written exactly.
So, how to teach computer to do the same?
UPDATE
Consider original image
Any global thresolding will cause artefacts (1) and nonuniform text representation (2)
I need some thresolding, which looks for local statistics.
Switch to adaptive thresholding.
Here you will find some introduction - http://homepages.inf.ed.ac.uk/rbf/HIPR2/adpthrsh.htm
Adaptive thresholding is designed to deal with exactly this kind of problems.
I am building an iOS app that, as a key feature, incorporates image matching. The problem is the images I need to recognize are small orienteering 10x10 plaques with simple large text on them. They can be quite reflective and will be outside(so the light conditions will be variable). Sample image
There will be up to 15 of these types of image in the pool and really all I need to detect is the text, in order to log where the user has been.
The problem I am facing is that with the image matching software I have tried, aurasma and slightly more successfully arlabs, they can't distinguish between them as they are primarily built to work with detailed images.
I need to accurately detect which plaque is being scanned and have considered using gps to refine the selection but the only reliable way I have found is to get the user to manually enter the text. One of the key attractions we have based the product around is being able to detect these images that are already in place and not have to set up any additional material.
Can anyone suggest a piece of software that would work(as is iOS friendly) or a method of detection that would be effective and interactive/pleasing for the user.
Sample environment:
http://www.orienteeringcoach.com/wp-content/uploads/2012/08/startfinishscp.jpeg
The environment can change substantially, basically anywhere a plaque could be positioned they are; fences, walls, and posts in either wooded or open areas, but overwhelmingly outdoors.
I'm not an iOs programmer, but I will try to answer from an algorithmic point of view. Essentially, you have a detection problem ("Where is the plaque?") and a classification problem ("Which one is it?"). Asking the user to keep the plaque in a pre-defined region is certainly a good idea. This solves the detection problem, which is often harder to solve with limited resources than the classification problem.
For classification, I see two alternatives:
The classic "Computer Vision" route would be feature extraction and classification. Local Binary Patterns and HOG are feature extractors known to be fast enough for mobile (the former more than the latter), and they are not too complicated to implement. Classifiers, however, are non-trivial, and you would probably have to search for an appropriate iOs library.
Alternatively, you could try to binarize the image, i.e. classify pixels as "plate" / white or "text" / black. Then you can use an error-tolerant similarity measure for comparing your binarized image with a binarized reference image of the plaque. The chamfer distance measure is a good candidate. It essentially boils down to comparing the distance transforms of your two binarized images. This is more tolerant to misalignment than comparing binary images directly. The distance transforms of the reference images can be pre-computed and stored on the device.
Personally, I would try the second approach. A (non-mobile) prototype of the second approach is relatively easy to code and evaluate with a good image processing library (OpenCV, Matlab + Image Processing Toolbox, Python, etc).
I managed to find a solution that is working quite well. Im not fully optimized yet but I think its just tweaking filters, as ill explain later on.
Initially I tried to set up opencv but it was very time consuming and a steep learning curve but it did give me an idea. The key to my problem is really detecting the characters within the image and ignoring the background, which was basically just noise. OCR was designed exactly for this purpose.
I found the free library tesseract (https://github.com/ldiqual/tesseract-ios-lib) easy to use and with plenty of customizability. At first the results were very random but applying sharpening and monochromatic filter and a color invert worked well to clean up the text. Next a marked out a target area on the ui and used that to cut out the rectangle of image to process. The speed of processing is slow on large images and this cut it dramatically. The OCR filter allowed me to restrict allowable characters and as the plaques follow a standard configuration this narrowed down the accuracy.
So far its been successful with the grey background plaques but I havent found the correct filter for the red and white editions. My goal will be to add color detection and remove the need to feed in the data type.
I'm trying to do an application which, among other things, is able to recognize chess positions on a computer screen from screenshots. I have very limited experience with image processing techniques and don't wish to invest a great amount of time in studying this, as this is just a pet project of mine.
Can anyone recommend me one or more image processing techniques that would yield me a good result?
The conditions are:
The image is always crispy clean, no noise, poor light conditions etc (since it's a screenshot)
I'm expecting a very low impact on computer performance while doing 1 image / second
I've thought of two modes to start the process:
Feed the piece shapes to the program (so that it knows what a queen, king etc. looks like)
just feed the program an initial image which contains the startup position, from which the program can (after it recognizes the position of the board) pick each chess piece
The process should be relatively easy to understand, as I don't have a very good grasp of image processing techniques (yet)
I'm not interested in using any specific technology, so technology-agnostic documentation would be ideal (C/C++, C#, Java examples would also be fine).
Thanks for taking the time to read this, and I hope to get some good answers.
It' an interesting problem, but you need to specify a lot more than in your original question in order to find an acceptable answer.
On the input images: "screenshots" is quote vague a category. Can you assume that the chessboard will always be entirely in view? Will you have multiple views of the same board? Can you assume that no pieces will be partially or completely occluded in all views?
On the imaged objects and the capture system: will the same chessboard and pieces be used, under very similar illumination? Will the same lens/camera/digitization pipeline be used?
Salut Andrei,
I have done a coin counting algorithm from a picture so the process should be helpful.
The algorithm is called Generalized Hough transform
Make the picture black and white, it is easier that way
Take the image from 1 piece and "slide it over the screenshot"
For each cell you calculate the nr of common pixel in the 2 images
Where you have the largest number there you have the piece
Hope this helps.
Yeah go with Salut Andrei,
Convert the picture into greyscale
Slice into 64 squares and store in array
Using Mat lab can identify the pieces easily
Color can be obtained from Calculating the percentage of No. dot pixels(black pixels)
threshold=no.black pixels /no. of black pixels + no. of white pixels,
If ur value is above threshold then WHITE else BLACK
I'm working on a similar project in c# finding which piece is which isn't the hard part for me. First step is to find a rectangle that shows just the board and cuts everything else out. I first hard-coded it to search for the colors of the squares but would like to make it more robust and reliable regardless of the color scheme. Trying to make it find squares of pixels that match within a certain threshold and extrapolate the board location from that.