Is there anybody who can guide me a bit about Berkeley Word Alignment tool? Specifically, I want to know how another distortion model can be implemented to be used in HMM word alignment.
Related
my first hurdle so far is that running tesseract vanilla on images of MTG cards doesn't recognize the card title (honestly that's all I need because I can use that text to pull the rest of the card info from a database) I think the issue might be having to need to train tesseract to recognize the font use in mtg cards but im wondering if it might be an issue with tesseract not looking or not detecting text in a section of the image (specifically the title.)
Edit: including an image of a MTG card for reference.http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=175263&type=card
Ok so, after asking on reddit programming forums I think I found an answer that I am going to pursue:
The training feature of tesseract is indeed for improving rates for unusual fonts, but that's probably not the reason you have low success.
The environment the text is in is not well controlled - the card background can be a texture in one of five colours plus artifacts and lands. Tesseract greyscales the image before processing, so the contrast between the text and background is not sufficient.
You could put your cards through a preprocessor which mutes coloured areas to white and enhances monotones. That should increase the contrast so tesseract can make out the characters.
If anyone still following thsi believes that above path to be the wrong one to start down, please say so.
TLDR
I believe you're on the right track, doing preprocessing.
But you will need doing both preprocessing AND training Tesseract.
Preprocessing
Basically, you want to get the title text, and only the title text, for Tesseract to read. I suggest you follow the steps below:
Identify the borders of the card.
Cut out the title area for further processing.
Convert the image to black'n'white.
Use contours to identify the exact text area, and crop it out.
Send the image you got to Tesseract.
How to create a basic preprocessing is shown in the YouTube video Automatic MTG card sorting: Part 2 - Automatic perspective correction with OpenCV Also have a look at the third part in that serie.
With this said, there is a number of problems you will encounter. How to handle split cards? Will your algorithm manage white borders? What if the card is rotated or upside-down? just to name a few.
Need for Training
But even if you manage to create a perfect preprocessing algorithm you will still have to train Tesseract. This is due to the special text font used on the cards (which happens to be different fonts depending on the age of the card!).
Consider the card "Kinjalli's Caller".
http://gatherer.wizards.com/Handlers/Image.ashx?multiverseid=435169&type=card
Note how similar the "j" is to the "i". An untrained Tesseract tend to mix them up.
Conclusion
All this considered, my answer to you is that you need to do both preprocessing of the card image AND to train Tesseract.
If you're still interested I would like to suggest that you have a look at this MTG-card reading project on GitHub. That way you don't have to reinvent the wheel.
https://github.com/klanderfri/ReadMagicCard
I develop an image recognition algorithm that helps to find characters on dirty pannels from the real world. Actually the image is a car registering plate containing letters, digits and a mud.
The algorithm must classify characters into two classes: alphabet characters and digits. Is it possible to train LBP or Haar cascade to discriminate between the two classes, will be training result stable due to digits shape variety?
Could you explain briefly or recommend better method, please?
"The algorithm must classify characters into two classes: alphabet characters and digits.” - you forgot mud and background though technically you can add them to a broad category “other”. Haars cascades are used for something like face detection since they typically approximate wavelets on the middle spatial scale where faces have characteristic features. Your problem is different.You need to first understand your problem structure, read the literature and only then try to use a sheer force of learning algorithms. This book actually talks a bit about people starting to think about method first instead of analyzing the problem which is not always a good idea.
Technically you first need to find the text in the image which can be more challenging than recognizing it given the current state of art OCR that is typically used as a library rather than created from scratch. To find text in the image I suggest first do adaptive thresholding to create a binary map (1-foreground that is letters and numbers and 0 is background), then perform connected components on the foreground coupled with SWT (stroke width transform) http://research.microsoft.com/pubs/149305/1509.pdf
We need a calibration pattern with outer dimensions of around 600 mm x 600 mm.
I tried to use the python script which can be found in the docs folder of the OpenCV distribution but it does not generate an svg of this size. It stops without an error message and does not write an svg file.
So I want to create the pattern on my own and want to understand the "rules":
is it better to use a different count of rows and columns?
how many circles do I need for a good calibration pattern?
which radius should I use in relation to the outer dimensions?
which spacing is needed between the circles?
which spacing is needed between the outer circles and the border of the whole pattern?
Because I can not print a pattern of this size and have to pay for the printing, I need to know the rules and can not try many different things.
Thanks!
Two links that may give some ideas. One is from Carnegie Mellon University website and the other is a paper from Janne Heikkila (see p7)
Disclaimer: I don't know about these patterns being optimized or not. I am also interested in learning more about this.
Edit: one more hint from opencv findCirclesGrid documentation: The function requires white space (like a square-thick border, the wider the better) around the board to make the detection more robust in various environments.
I have done a OCR application for handwritten normal characters.For the segmentation of characters I have used histogram profile method. That successfully works for normal English characters.
I have used horizontal projection for line segmentation and vertical projection for character segmentation.
To segment lines of cursive hand written article I can use horizontal projection as previous. But I can't use same methodology for cursive English character segmentation since they are merged each other and also slanted. Can anyone please help me with a way to segment cursive characters.
This is a difficult problem to solve due to the variability between writers and character shapes. One option, which has achieved up to 83% accuracy, is to analyze the ligatures (connections between characters) in the writing and draw columns on the image using those ligatures as a base point. In 2013, Procedia Computer Science proposed this approach and published their research on this particular problem: https://ac.els-cdn.com/S1877050913001464/1-s2.0-S1877050913001464-main.pdf?_tid=5f55eac2-0077-11e8-9d79-00000aacb35f&acdnat=1516737513_c5b6e8cb8184f69b2d10f84cd4975d56
Another approach to try is called skeletal analysis which takes the word as a whole and matches its shape with other known word shapes and predicts the word based on the entire image.
Good luck!
I am currently doing a project on iOS. The app I am developing is supposed to take a picture from a check (cheque?), and read the CMC7 number which is written at the bottom of the check.
Currently, I am working on it with openCV, because of the work that was previously done on the project before I arrived, but:
Is openCV better than Tesseract for that kind of job?
The difficulty here consists in the font that is used, which is this one :
http://www.dafont.com/fr/cmc7.font
As you can imagine, usual OCR can't recognize this font because of its shape. I think that the best way to do this job is to use the barcode of the font in order to recognize it, and not using the shape of the characters.
The think is that from what I know, Tesseract can recognize different kinds of fonts, and we can train it to a specific font, but what about this font that is used for the CMC7?
If I want to work on the barcode, is there a way to do it with Tesseract, or can't it only be used for font recognition?
We have the same problem. I don't think that it is possible to obtain features from cmc7 in barcode manner. Because you have different stroke height and position inside the number placeholder. I'm not familiar with Tesseract but for all type of correlators you may to choose features that strongly defines categories with low variance between samples in category. We are thinking to use the scale invariant features like LBP , HOG or eigenvectors to eliminate data loosing after interpolation.