I made patterns: images with the "A" letter of different sizes (from 12 to 72: 12, 14, .., 72)
And I tested the method of pattern matching and it gave a good results.
One way to select text regions from image is to run that algorithm for all small and big letters and digits of different sizes. And fonts!
I don't like it. Instead of it I want to make something like a universal pattern or
better to say: scanning image with different window sizes and select those regions where some function (probability of that there is a character at that window) is more than some fixed value.
Do you know any methods or ideas to make that function?
It must work with original image (grayscale).
I suppose you are developing OCR, right?
You decided to go quite unusual way since everyone else do matching on bi-tonal images. This makes everything much simplier. Once you degradated it properly (which is very difficult task by itself), you do not have to deal with different brightness levels, take care about uneven background, etc. And sure, less computation resources needed. However, is doing everything in grayscale is actually your goal and you want to show other OCR scientists that it is actually doable - well, I wish you good luck then.
Approach of letters location you described is very-very-very computation intesive. You have to scan whole image (image_size^2), then match with pattern ( * pattern_size^2) and then do it for each pattens ( * pattern_num ). This will be incredibly slow.
Instead try to simplify your algorithm to break it to two stages. First should look for some features on picture (like connected dark regions, or split image on large squares and throw away all light ones) and only then perform pattern matching on small number of found areas. This is all at least N^2, and you could try to reduce complexity to working on rows or columns of image first (by creating histogram). So there is a lot of different simplification methods you can try to play with.
After you have located those objects on picture and going to match patterns on them, you actually know their size, so you don't have to store letter A in all sizes, you can just rescale original image of object to the size say 72, and match it.
As to fonts - you don't really have much choice here, you will need to match against all possible shapes of A to make sure you found A. But once you match against just one size of A - you have more computing power to try different A's.
Related
I'm trying to extract handwritten text from an image to enable ocr. My forms contain textboxes so it is not too complex to get the right regions of interests, but the problem is most people have issues to stay within the boundaries of the boxes. While I can increase the area to cover for this, the result is that I get my string, and some part of the box above and beyond.
Like below image
Depending on the level of pollution on top or bottom of the picture, the OCR software happily ignores, or adds random nonsense. So in order to be safe I need to get rid of as much as possible, while at the same time I need to keep my 'full' letters intact to ensure there is enough quality left for the OCR part.
The expected output should just show ITEGEM (which is a small place in Belgium, nothing fancy here)
like this :
I've been trying a few things, but standard blob detection is too harsch as it also removes part of the first T, as there are a few pixels between the top of the T and the base of the T, so I get left with I instead of T.
Any suggestions to get me back on track (preferably python)?
I have a collection of type-written image captions which look like this:
I know that the typewriter is consistent and monospace, with characters measuring 14x22px (as measured from the top of a capital letter to the bottom of a descender).
Tesseract is producing output like this:
The results are mostly good when Tesseract has detected the correct bounding boxes for the letters. But there are many strings of letters which are clumped together (e.g. "Ea", "tree", "fr" and "om" on the first line). These are always transcribed incorrectly and account for the majority of errors.
This is frustrating because I know a priori that all the characters are of a particular size. Is it possible pass this knowledge on to the tesseract command line tool?
My command to generate the box file is:
tesseract foo.jpg foo batch.nochop makebox
If possible, I'd prefer to avoid training Tesseract on the font—I don't have any manually transcribed samples, so building a corpus of training data would require some effort.
I'm not sure that Tesseract throws connected characters completely off as Noremac said.
Actually I think that it includes a chopping of joined characters whenever the result of a word detection is unsatisfactory, as explained in the paragraph 4.1 of An Overview of the Tesseract OCR Engine
And I also think that once it finds a fixed pitch text, it should automatically chop the text, even if the characters are connected (look at figure 2 of the same paper).
I know that it's a little bit late to add this answer, but maybe it will help some future visitors!
The issue isn't the font size as much as it is with the letters connecting. If you zoom in on the above images with a program that will show the actual pixels (rather than blurring them together) you can see that those grouping two characters are actually connected. tessearctOCR is completely based on connected components so if they are connected at all then it throws it completely off. I see a couple of options:
If possible, give it a higher resolution image where there is more separation between the characters
Adjust the preprocessing to do a more strict threshold.
I noticed that the pixel connecting the E and the a on the first occurrence is lighter so adjusting the threshold will remove that connection. However, this could affect more than what you want, such as disjointing characters where you don't expect.
For updating the thresholding consider this: https://groups.google.com/forum/#!topic/tesseract-ocr/JRwIz3xL45U
I'm implementing an application to search a photo in a catalog of textures comparing the histogram.
In order to enhance the accuracy, What processes should I apply to the photo to normalize/clean it before the matching with the catalog?
UPDATE
I added a actual photo made with the Android camera, and the desired match image that it's saved in the catalog.
How can I process the photo to correct colors, enhance and made posible a better match with the catalog.
It really depends on the textures. I think the question to ask is what variation is acceptable and what variation do you want to remove from the catalog. Or put another way, which features do you care about and want to search for?
For example, if color is not important, then a step in normalizing/cleaning would be to convert all to grayscale to remove potential variations. Perhaps a more pertinent example would be that you only want to compare against the strongest edges in the texture so you would blur out the weaker edges.
It all really depends on your specific use case. Consider what you really want to match against and the more specific you can get, the more normalizing and cleaning you can do, and the more accurate your application will be.
Your question should contain example data and your solution attempt. Lets say you want to find how much does image 0.35 compression distort image.
img = Import["http://farm1.staticflickr.com/62/171463865_36ee36f70e.jpg"]
img2 = ImportString#ExportString[img, "JPEG", "CompressionLevel" -> 0.35]
diff = ImageSubtract[Image[img, "Real"], Image[img2, "Real"]]
ArrayPlot[0.5 + 5 ImageData[First#ColorSeparate[diff, "Red"]],
ColorFunction -> "RedGreenSplit", ColorFunctionScaling -> False]
The difference is slight so output difference is amplified 5 times.
Example was done using Mathematica (aka Wolfram language). Original question was about : How can I tell exactly what changed between two images?.
Other then that #Noremac is right - it really depends on your specific use case.
I have the following image and has binarization。
i need to segmentation this image and recognized the digit.the double digits 4' and '9' that are connected together.
i read a some of document that mention about 'watershed morphology' method.the following image has be implemented a 'watershed segmentation'.
it's obvious that double 44 digits still connected but a 9 digit already segmentation to success.
i need some help how to segmentation a 44 characters!thanks.
zhengchun,
you need to understand that this is a quite difficult task which, in my opinion, cannot be perfectly solved in all cases.
In the first place, correctly splitting between characters without prior knowledge on their size and shape is just impossible: just consider the letter W, it could very well be split into two V's; on the opposite, nothing can tell you that two accidentally touching IJ are indeed two different letters rather then a U.
This means that no "blind" method like the watershed or any other can succeed, whatever the sophistication. Geometry alone is not enough, you need to rely on some description of the font (sizes and shapes).
To the best of my knowledge, you must let segmentation and recognition work together. What you can do is:
use the initial segmentation, hoping that touching and broken characters do not arise so often;
starting from the left, try immediate character recognition by splitting after one character width (you will need to try every font character in turn, possibly with different widths);
keep the most likely recognition result and continue recognition from that split, to the right;
if you expect broken characters, you can as well try recognitions that span two or more blobs and group these. (Gaps between blobs are good hints for splits, unless your characters can be broken or miss parts.)
You can improve the above procedure by adding heuristics to decide where splits are more likely, such as at a height minimum, but this is tricky. A pinch of black magic...
all I can find in the web is about OCR but I'm not there yet, I still have to recognize where the letters are in the image.
any help will be appreciated
The interesting thing is that the answer is not that simple as it may seem. Some may think that locating characters on the picture is first step of OCR, but it is not the case. Actually, you won't be sure where each character is located until you actually finish with recognizing.
The way it works completely depends on the type of image you are going to recognize. First you should segment you image on text areas (blocks) and everything other.
Just few examples:
If you are recognizing license plate on car picture, you should first locate license plate, and only then split it to separate characters.
If you are recognizing some application form, you can locate areas where text is just by knowing it's layout
If you are recognizing scan of book page, you have to distinguish pictures from text areas and then work only on text.
Starting from this moment you don't need original image any more, all you need is binarized image of text block. All OCR alorithms work on binary images. You may need also doing other kind of image transformations like line straightening, perspective correction, skew correction and so on - all that again depends on type of images you are recognizing.
Once text block is found and normalized, you should go further and find lines of text on the text block. In trivial case of horisontal lines of text it is quite simple by creating pixel histogram by horisontal lines.
Now, when you have lines, you may think that now it is simple, you can split it to characters, huray! Again, it is wrong. There are such phenomena as connected characters, broken characters and even ligatures (two letters forming one single shape), or letter that have their parts go further to the right above or bellow next character. What you should do is to create several hipotesis of splitting line to words and individual characters, then try OCR every single variant, weight every hypotesis with confidence level. Last step would be checking different paths in this graph using dictionary and selecting best one.
And only now, when you actually recognized everything, you can say where individual characters are located.
So, simple answer is: recognize your image with OCR program, and get coordinates of charaters from it's output.
Generally speaking you'll be looking for small contiguous areas of nearly solid color. I would suggest sampling each pixel and building an array of nearby pixels that also fall within a threshold of the original pixels color (repeat for neighbours of each matching pixel). Put the entire array aside as a potential character (or check it now) and move on (potentially ignoring previously collected pixels for a speedup).
Optimisations are possible if you know in advance the font-size, quality and/or color of the text. If not you'll want to be fairly generous with your thresholds of what constitutes a "contiguous area".