i am looking for a technique or a known method to search a part of similar segments in a handwritten text.
its a kind of image retrieval, but rather than searching for an entire word or character, i want to search similar parts of strokes given a pattern as input image.
the figure below illustrate this process, where the red segments are input images and red rectangles represent part of text similar to the input.
by "similar", i mean "approximately", not exact matching
thanks in advance
I believe Shape Context can be helpful for your task. You can read more about it here.
Shape context descriptor allows you to robustly describe a local shape. Finding two shape-context descriptors that are nearly identical strongly suggests that the underlying text pattern are quite similar.
You can find Matlab implementation in the project's home page.
Related
I'm trying to detect objects and text in a hand-drawn diagram.
My goal is to be able to "parse" something like this into an object structure for further processing.
My first aim is to detect text, lines and boxes (arrows etc... are not important (for now ;))
I can do Dilatation, Erosion, Otsu thresholding, Invert etc and easily get to something like this
What I need some guidance for are the next steps.
I've have several ideas:
Contour Analysis
OCR using UNIPEN
Edge detection
Contour Analysis
I've been reading about "Contour Analysis for Image Recognition in C#" on CodeProject which could be a great way to recognize boxes etc. but my issue is that the boxes are connected and therefore do not form separate objects to match with a template.
Therefore I need some advises IF this is a feasible way to go.
OCR using UNIPEN
I would like to use UNIPEN (see "Large pattern recognition system using multi neural networks" on CodeProject) to recognize handwritten letters and then "remove" them from the image leaving only the boxes and lines.
Edge detection
Another way could be to detect all lines and corners and in that way infer the boxes and lines that the image consist of. In that case ideas on how to straighten the lines and find the 90 degree corners would be helpful.
Generally, I think I just need some pointers on which strategy to apply, not code samples (though it would be great ;))
I will try to answer about the contour analysis and the lines between them.
If you need to turn the interconnected boxes into separate objects, that can be achieved easily enough:
close the gaps in the box edges with morphological closing
perform connected components labeling and look for compact objects (e.g. objects whose area is close to the area of their bounding box)
You will get the insides of the boxes. These can be elliptical or rectangular or any shape you may find in common diagrams, the contour analysis can tell you which. A problem may arise for enclosed background areas (e.g. the space between the ABC links in your example diagram). You might eliminate these on the criterion that their bounding box overlaps with multiple other objects' bounding boxes.
Now find line segments with HoughLinesP. If a segment finishes or starts within a certain distance of the edge of one of the objects, you can assume it is connected to that object.
As an added touch you could try to detect arrow ends on either side by checking the width profile of the line segments in a neighbourhood of their endpoints.
It is an interesting problem, I will try to remember it and give it to my students to grit their teeth on.
I'm interested in taking user stroke input (i.e. drawing with an iPad) and classifying it as either text or a drawing (or, I suppose, just non-text), in whatever capacity is reasonably feasible. I'm not expecting a pre-built library for this, I'm just having a hard time finding any papers or algorithmic resources about this.
I don't need to detect what the text is that they're drawing, just whether it's likely text or not.
I would think you will need to generate probabilities for what text character the input is. If the highest probability text character is below some threshold, classify the stroke as drawing.
This is a possible useful paper: http://arxiv.org/pdf/1304.0421v1.pdf (if only for its reference list). Also the first hit on this google scholar search looks relevant: http://scholar.google.com/scholar?q=classification+stroke+input+text+or+drawing
I am trying to determine the best method to extract handwritten data from a scanned document.
The handwritten data is in specific boxed areas. I generated the digital version of the document, and therefore I know both the co-ordinates of the boxed areas, and could also generate additional variations of the document if need be (i.e. a version that is masked to make the fields easier to extract)
The reason I can't just extract the fields using the co-ordinates from document generation is there is shifting/scaling/perspective modifications which are occurring during the scanning process, which can push/pull the co-ordinates for each individual box differently (the scanned document does have corner markers used for alignment, but even so unintended transformations commonly take place).
I assume high level there are two ways to address this issue: step through the co-ordinates of each box on the page and attempt to "correct" them with some technique/algorithm, or compare a completed form with a blank form (masked?) and try to extract the correct fields that way.
What is the most efficient technique / algorithm to adjust for these modifications and accurately extract the areas which contain handwriting? Are there other options?
There many possible techniques that can achieve nearly 100% accuracy for your problem.
Just follow steps described on this page http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms. In short, you first compute optical flow between two images and then estimate transformation that produces such optical flow.
Note: this approach works best when matched images are almost identical.
Your second approach would do. Some more details: since you have printed letters in the form such as "Section A", "A6" and "other", and you mentioned you have corner markers for alignment, you could use them as landmarks, perform a template matching to find the coordinates of the landmarks in original and scanned documents. Then use these two sets of landmarks (The corner marks might be sufficient) to generate an affine transformation M = cv2.getAffineTransform(landmarks1, landmarks2), apply cv2.warpAffine(img, M, ...) to the scanned image, to transform it to match the original document. With this, the boxes will be aligned properly (might still have a little bit of shift), then you could locate each boxes correctly. See https://www.geeksforgeeks.org/python-opencv-affine-transformation/
After typing all the above I found this webpage talking about the same thing with code: https://learnopencv.com/feature-based-image-alignment-using-opencv-c-python/
What method is suitable to capture (detect) MRZ from a photo of a document? I'm thinking about cascade classifier (e.g. Viola-Jones), but it seems a bit weird to use it for this problem.
If you know that you will look for text in a passport, why not try to find passport model points on it first. Match template of a passport to it by using ASM/AAM (Active shape model, Active Appearance Model) techniques. Once you have passport position information you can cut out the regions that you are interested in. This will take some time to implement though.
Consider this approach as a great starting point:
Black top-hat followed by a horisontal derivative highlights long rows of characters.
Morphological closing operation(s) merge the nearby characters and character rows together into a single large blob.
Optional erosion operation(s) remove the small blobs.
Otsu thresholding followed by contour detection and filtering away the contours which are apparently too small, too round, or located in the wrong place will get you a small number of possible locations for the MRZ
Finally, compute bounding boxes for the locations you found and see whether you can OCR them successfully.
It may not be the most efficient way to solve the problem, but it is surprisingly robust.
A better approach would be the use of projection profile methods. A projection profile method is based on the following idea:
Create an array A with an entry for every row in your b/w input document. Now set A[i] to the number of black pixels in the i-th row of your original image.
(You can also create a vertical projection profile by considering columns in the original image instead of rows.)
Now the array A is the projected row/column histogram of your document and the problem of detecting MRZs can be approached by examining the valleys in the A histogram.
This problem, however, is not completely solved, so there are many variations and improvements. Here's some additional documentation:
Projection profiles in Google Scholar: http://scholar.google.com/scholar?q=projection+profile+method
Tesseract-ocr, a great open source OCR library: https://code.google.com/p/tesseract-ocr/
Viola & Jones' Haar-like features generate many (many (many)) features to try to describe an object and are a bit more robust to scale and the like. Their approach was a unique approach to a difficult problem.
Here, however, you have plenty of constraint on the problem and anything like that seems a bit overkill. Rather than 'optimizing early', I'd say evaluate the standard OCR tools off the shelf and see where they get you. I believe you'll be pleasantly surprised.
PS:
You'll want to preprocess the image to isolate the characters on a white background. This can be done quite easily and will help the OCR algorithms significantly.
You might want to consider using stroke width transform.
You can follow these tips to implement it.
I want to find an algorithm which can find broken lines or shapes in a bitmap. consider a situation in which I have a bitmap with just two colors, back and white ( Images used in coloring books), there are some curves and lines which should be connected to each other, but due to some scanning errors, white bits sit instead of black ones. How should I detect them? (After this job, I want to convert bitmaps into vector file. I want to work with potrace algorithm).
If you have any Idea, please let me know.
Here is a simple algorithm to heal small gaps:
First, use a filter which creates a black pixel when any of its eight neighbors is black. This will grow your general outline.
Next, use a thinning filter which removes the extra outline but leaves the filled gaps alone.
See this article for some filters and parameters: Image Processing Lab in C#
The simplest approach is to use a morphological technique called closing.
This will work only if the gaps in the lines are quite small in relation to how close the different lines are to each other.
How you choose the structuring elemt to perform the closing can also make performance better or worse.
The Wikipedia article is very theoretical (or mathematical) so you might want to turn to Google or any book on Image Processing to get a better explanation on how it is done.
Maybe Hough Transform can help you. Bonus: you get the lines parameters for your vector file.