cursive character segmentation in OCR - opencv

I have done a OCR application for handwritten normal characters.For the segmentation of characters I have used histogram profile method. That successfully works for normal English characters.
I have used horizontal projection for line segmentation and vertical projection for character segmentation.
To segment lines of cursive hand written article I can use horizontal projection as previous. But I can't use same methodology for cursive English character segmentation since they are merged each other and also slanted. Can anyone please help me with a way to segment cursive characters.

This is a difficult problem to solve due to the variability between writers and character shapes. One option, which has achieved up to 83% accuracy, is to analyze the ligatures (connections between characters) in the writing and draw columns on the image using those ligatures as a base point. In 2013, Procedia Computer Science proposed this approach and published their research on this particular problem: https://ac.els-cdn.com/S1877050913001464/1-s2.0-S1877050913001464-main.pdf?_tid=5f55eac2-0077-11e8-9d79-00000aacb35f&acdnat=1516737513_c5b6e8cb8184f69b2d10f84cd4975d56
Another approach to try is called skeletal analysis which takes the word as a whole and matches its shape with other known word shapes and predicts the word based on the entire image.
Good luck!

Related

How to differentiate 2 classes: digits and "other letters and noise" on an image?

I develop an image recognition algorithm that helps to find characters on dirty pannels from the real world. Actually the image is a car registering plate containing letters, digits and a mud.
The algorithm must classify characters into two classes: alphabet characters and digits. Is it possible to train LBP or Haar cascade to discriminate between the two classes, will be training result stable due to digits shape variety?
Could you explain briefly or recommend better method, please?
"The algorithm must classify characters into two classes: alphabet characters and digits.” - you forgot mud and background though technically you can add them to a broad category “other”. Haars cascades are used for something like face detection since they typically approximate wavelets on the middle spatial scale where faces have characteristic features. Your problem is different.You need to first understand your problem structure, read the literature and only then try to use a sheer force of learning algorithms. This book actually talks a bit about people starting to think about method first instead of analyzing the problem which is not always a good idea.
Technically you first need to find the text in the image which can be more challenging than recognizing it given the current state of art OCR that is typically used as a library rather than created from scratch. To find text in the image I suggest first do adaptive thresholding to create a binary map (1-foreground that is letters and numbers and 0 is background), then perform connected components on the foreground coupled with SWT (stroke width transform) http://research.microsoft.com/pubs/149305/1509.pdf

OCR detection with openCV

I'm trying to create a simpler OCR enginge by using openCV. I have this image: https://dl.dropbox.com/u/63179/opencv/test-image.png
I have saved all possible characters as images and trying to detect this images in input image.
From here I need to identify the code. I have been trying matchTemplate and FAST detection. Both seem to fail (or more likely: I'm doing something wrong).
When I used the matchTemplate method I found the edges of both the input image and the reference images using Sobel. This provide a working result but the accuracy is not good enough.
When using the FAST method it seems like I cant get any interresting descriptions from the cvExtractSURF method.
Any recomendations on the best way to be able to read this kind of code?
UPDATE 1 (2012-03-20)
I have had some progress. I'm trying to find the bounding rects of the characters but the matrix font is killing me. See the samples below:
My font: https://dl.dropbox.com/u/63179/opencv/IMG_0873.PNG
My font filled in: https://dl.dropbox.com/u/63179/opencv/IMG_0875.PNG
Other font: https://dl.dropbox.com/u/63179/opencv/IMG_0874.PNG
As seen in the samples I find the bounding rects for a less complex font and if I can fill in the space between the dots in my font it also works. Is there a way to achieve this with opencv? If I can find the bounding box of each character it would be much more simple to recognize the character.
Any ideas?
Update 2 (2013-03-21)
Ok, I had some luck with finding the bounding boxes. See image:
https://dl.dropbox.com/u/63179/opencv/IMG_0891.PNG
I'm not sure where to go from here. I tried to use matchTemplate template but I guess that is not a good option in this case? I guess that is better when searching for the exact match in a bigger picture?
I tried to use surf but when I try to extract the descriptors with cvExtractSURF for each bounding box I get 0 descriptors... Any ideas?
What method would be most appropriate to use to be able to match the bounding box against a reference image?
You're going the hard way with FASt+SURF, because they were not designed for this task.
In particular, FAST detects corner-like features that are ubiquituous iin structure-from-motion but far less present in OCR.
Two suggestions:
maybe build a feature vector from the number and locations of FAST keypoints, I think that oyu can rapidly check if these features are dsicriminant enough, and if yes train a classifier from that
(the one I would choose myself) partition your image samples into smaller squares. Compute only the decsriptor of SURF for each square and concatenate all of them to form the feature vector for a given sample. Then train a classifier with these feature vectors.
Note that option 2 works with any descriptor that you can find in OpenCV (SIFT, SURF, FREAK...).
Answer to update 1
Here is a little trick that senior people taught me when I started.
On your image with the dots, you can project your binarized data to the horizontal and vertical axes.
By searching for holes (disconnections) in the projected patterns, you are likely to recover almost all the boudnig boxes in your example.
Answer to update 2
At this point, you're back the my initial answer: SURF will be of no good here.
Instead, a standard way is to binarize each bounding box (to 0 - 1 depending on background/letter), normalize the bounding boxes to a standard size, and train a classifier from here.
There are several tutorials and blog posts on the web about how to do digit recognition using neural networks or SVM's, you just have to replace digits by your letters.
Your work is almost done! Training and using a classifier is tedious but straightforward.

OCR: segmentation of small text

The problem
I've been building a (very) simple OCR engine.
Since I'm trying to classify very small (pixel size) characters, I'm having some difficulties on segmentation. Here's an example, after best-effort image-wide thresholding:
:
What I've tried
Error detection:
large horizontal size of the segments. It works, mostly, but fails (false positive)
for a few larger characters.
classify, and reject on low score. This seems a bit wasteful.
Error correction:
add pixels vertically (vertical histogram), find minimum. It cuts many segments on the wrong place, in many of the samples.
What I haven't tried yet
Trying to classify on all possible segmentation points (pixels). This would be very wasteful, and be difficult to expand for a 3-merged-characters segment.
I've been reading up on morphology approaches to turn the characters into mathematical curves, but I don't know really know where to start, or if it's worth the effort
Where to go from here?
I have no idea. Hence this question :)
Lean back and half close your eyes.
63 :-)
Now, if only it was so easy for a computer!
It's tantalisingly close to what double-patterning does (or un-does?) in silicon masks.
I would suggest oversampling (doubling or quadrupling the pixel count in each axis), filtering (probably low pass - or possibly bandpass where the passband = spatial frequency of a line), re-thresholding until they separate. Expensive, so only apply in problem areas.
Reinvent your problem so you do not need segmentation.
Really, for this scale I think you better invest in other approaches. For example, if you OCR on text (do you?) you can use the information of lines (character height). There are not many fonts that can be used for small (yet readable) characters. My approach would be a algorithm that scan lines in scanlines (from left to right, take pixels from top to bottom) and try to find correlations between trained text and scanlines (n, n-1... n-x)
And you probably need the information I the grayscale levels as well, so better not to threshold the images.

How to detect exact, predefined shapes with hough transform, like a "W"?

Let's say I have some system that scans documents, where all documents use the same font and font size.
In these documents, there will always be the same looking letter "W". Let's say it is always 20 px large. How can I set up the hough transform to recognize this letter "W" at 20 px large in my documents?
A quick Google search yields the following information of interest:
Generalizing the Hough Transform to Detect Arbitrary Shapes
and it looks like a lecture using the above paper as its source.
Also, if it's an actual "W", would an OCR engine like Tesseract be better suited to your needs?
The Hough transform for lines finds best fit line equations. You would need to do additional processing to find just the line segments. If the character thickness is several pixels, then to effectively find lines you might want to reduce the thickness to one pixel. There are techniques to do that, but also various algorithmic traps.
Once you have your line segments, you would still have to write an algorithm to identify characters based on the relative position and angle of the line segments. It's harder than it first appears.
A normalized cross-correlation (template matching) could work if you're certain that the image will always be in a certain rotation, the characters will always be the same size, etc. But even for scans you'll see some rotation and some variation in contrast.
All that aside, it's likely cheaper in the long run to use a commercial OCR package or reasonably good open source project. OCR is hard to implement if you're not already familiar with image processing.

Scoreboard digit recognition using OpenCV

I am trying to extract numbers from a typical scoreboard that you would find at a high school gym. I have each number in a digital "alarm clock" font and have managed to perspective correct, threshold and extract a given digit from the video feed
Here's a sample of my template input
My problem is that no one classification method will accurately determine all digits 0-9. I have tried several methods
1) Tesseract OCR - this one consistently messes up on 4 and frequently returns weird results. Just using the command line version. If I actually try to train it on an "alarm clock" font, I get unknown character every time.
2) kNearest with OpenCV - I search a database consisting of my template images (0-9) and see which one is nearest. I frequently get confusion between 3/1 and 7/1
3) cvMatchShapes - this one is fairly bad, it usually can't tell the difference between 2 of the digits for each input digit
4) Tangent Distance - This one is the closest, but the smallest tangent distance between the input and my templates ends up mapping "7" to "1" every time
I'm really at a loss to get a classification algorithm for such a simple problem. I feel I have cleaned up the input fairly well and it's a fairly simple case for classification but I can't get anything reliable enough to actually use in practice. Any ideas about where to look for classification algorithms, or how to use them correctly would be appreciated. Am I not cleaning up the input? What about a better input database? I don't know what else I'd use for input, each digit and template looks spot on at this point.
The classical digit recognition, which should work well in this case is to crop the image just around the digit and resize it to 4x4 pixels.
A Discrete Cosine Transform (DCT) can be used to further slim down the search space. You could select the first 4-6 values.
With those values, train a classifier. SVM is a good one, readily available in OpenCV.
It is not as simple as emma's or martin suggestions, but it's more elegant and, I think, more robust.
Given the width/height ratio of your input, you may choose a different resolution, like 3x4. Choose the smallest one that retains readable digits.
Given the highly regular nature of your input, you could define a set of 7 target areas of the image to check. Each area should encompass some significant portion of one of the 7 segments of each digital of the display, but not overlap.
You can then check each area and average the color / brightness of the pixels in to to generate a probability for a given binary state. If your probability is high on all areas you can then easily figure out what the digit is.
It's not as elegant as a pure ML type algorithm, but ML is far more suited to inputs which are not regular, and in this case that does not seem to apply - so you trade elegance for accuracy.
Might sound silly but have you tried simply checking for black bars vertically and then horizontally in the top and bottom halfs - left and right of the centerline ?
If you are trying text recognition with Tesseract, try passing not one digit, but a number of duplicated digits, sometimes it could produce better results, here's the example.
However, if you're planning a business software, you may want to have a look at a commercial OCR SDK. For example, try ABBYY FineReader Engine. It's not affordable for free to use applications, but when it comes to business, it can a good value to your product. As far as i know, ABBYY provides the best OCR quality, for example check out http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
You want your scorecard image inputs S feeding an algorithm that maps them to {0,1,2,3,4,5,6,7,8,9}.
Let V denote the set of n-tuples of integers.
Construct an algorithm α that maps each image S to a n-tuple
(k1,k2,...,kn)
that can differentiate between two different scoreboard digits.
If you can specify the range of α then you only have to collect the vectors in V that correspond to a digit in order to solve the problem.
I've applied this idea using Martin Beckett's idea and it works. My initial attempt was a simple injection into a 2-tuple by vertical left-to-right summing, with the first integer a image column offset and the second integer was the length of a 'nice' vertical line.
This did not work - images for 6 and 8 would map to the same vectors. So I needed another mini-info-capture for my digit input types (they are not scoreboard) and a 3-tuple info vector does the trick.

Resources