I have a table similar to this where the text is actually handwritten. I want to be able to check if each field has been filled out but understanding or recognising what each field says is not necessary. I just need to be able to detect that the field has been filled out.
example of table
People will have to scan pages with a scanner and the program should detect the fields and check if they have any contents essentially. Does anyone have any ideas or know of simple solutions. I was thinking using ICR or OCR but OCR can't detect handwritten text and ICR is only good if you pay for it but it is too good.
This could be very easy but it depends on how static the situation is. Convert the image to a grayscale image then apply a threshold to separate black and white. Next you have to ignore the too small white regions that belong to letters like a or o and then apply a closing to the remaining regions to get the single table cells. Now you can determine the average gray value at the position of each region. If it is above a certain value, then you have found a cell that is filled in.
This method only works if the white background of a cell is somehow connected, otherwise the closing will not work as desired. The lighting situation is also critical with fixed threshold values.
Related
I'm trying to extract handwritten text from an image to enable ocr. My forms contain textboxes so it is not too complex to get the right regions of interests, but the problem is most people have issues to stay within the boundaries of the boxes. While I can increase the area to cover for this, the result is that I get my string, and some part of the box above and beyond.
Like below image
Depending on the level of pollution on top or bottom of the picture, the OCR software happily ignores, or adds random nonsense. So in order to be safe I need to get rid of as much as possible, while at the same time I need to keep my 'full' letters intact to ensure there is enough quality left for the OCR part.
The expected output should just show ITEGEM (which is a small place in Belgium, nothing fancy here)
like this :
I've been trying a few things, but standard blob detection is too harsch as it also removes part of the first T, as there are a few pixels between the top of the T and the base of the T, so I get left with I instead of T.
Any suggestions to get me back on track (preferably python)?
I would like to customize my histogram to show an array of colors that refer to the frequency of that value. Essentially, if there are 6 times a value appears I would like it to be bright red, vs 2 times a value appears it would be dark red. Is there a way to assign colors to each bar graph manually, or perhaps a way the program automatically assigns colors based on the frequency the data point appears.
I'm willing to also keep colorization of my graph to be similar to the color bar attached in the image below.
I haven't been able to find a walkthrough that details what I'm looking for. I am very green with the Octave app and am not great at navigating it without explicit instruction. Thank you for the help!
I do have few images. Some of the images contains text and few other doesn't contains text at all. I want a robust algorithm which can conclude if image contains text or not.
Even Probabilistic Algorithms are fine.
Can anyone suggest such algorithm?
Thanks
There are a some specifics that you'll want to pin down:
Will there be much text in the image? Or just a character or two?
Will the text be oriented properly? Or does rotation also need to be performed?
How big will you expect the text to be?
How similar to text will be background be?
Since images can vary significantly you want to define the problem and find as many constraints as you can to make the problem as simple as possible. It's a difficult problem.
For such an algorithm you'll want to focus on what makes text unique from the background (consistent spacing between characters and lines, consistent height, consistent baseline, etc. There's an area of research in "text detection" that you'll want to investigate and you'll find a number of algorithms there. Two surveys of some of these methods can be found here and here
I am creating an app at the moment which take a picture of a person's face and I want to change the colour of their skin (just for fun!)
I have a piece of code that runs through the image pixel by pixel and finds the skin colour and then amends it to a new colour, which kind of works, but even though I am allowing for the differences in tones and adjusting the new colour in the same way it is still very hit and miss.
can anyone point me in the right direction?? is it even possible? I dont really want to use a filter as I dont think it would give the right effect.
Thanks
You should look at some computer vision techniques such as segmentation and feature extraction.
all I can find in the web is about OCR but I'm not there yet, I still have to recognize where the letters are in the image.
any help will be appreciated
The interesting thing is that the answer is not that simple as it may seem. Some may think that locating characters on the picture is first step of OCR, but it is not the case. Actually, you won't be sure where each character is located until you actually finish with recognizing.
The way it works completely depends on the type of image you are going to recognize. First you should segment you image on text areas (blocks) and everything other.
Just few examples:
If you are recognizing license plate on car picture, you should first locate license plate, and only then split it to separate characters.
If you are recognizing some application form, you can locate areas where text is just by knowing it's layout
If you are recognizing scan of book page, you have to distinguish pictures from text areas and then work only on text.
Starting from this moment you don't need original image any more, all you need is binarized image of text block. All OCR alorithms work on binary images. You may need also doing other kind of image transformations like line straightening, perspective correction, skew correction and so on - all that again depends on type of images you are recognizing.
Once text block is found and normalized, you should go further and find lines of text on the text block. In trivial case of horisontal lines of text it is quite simple by creating pixel histogram by horisontal lines.
Now, when you have lines, you may think that now it is simple, you can split it to characters, huray! Again, it is wrong. There are such phenomena as connected characters, broken characters and even ligatures (two letters forming one single shape), or letter that have their parts go further to the right above or bellow next character. What you should do is to create several hipotesis of splitting line to words and individual characters, then try OCR every single variant, weight every hypotesis with confidence level. Last step would be checking different paths in this graph using dictionary and selecting best one.
And only now, when you actually recognized everything, you can say where individual characters are located.
So, simple answer is: recognize your image with OCR program, and get coordinates of charaters from it's output.
Generally speaking you'll be looking for small contiguous areas of nearly solid color. I would suggest sampling each pixel and building an array of nearby pixels that also fall within a threshold of the original pixels color (repeat for neighbours of each matching pixel). Put the entire array aside as a potential character (or check it now) and move on (potentially ignoring previously collected pixels for a speedup).
Optimisations are possible if you know in advance the font-size, quality and/or color of the text. If not you'll want to be fairly generous with your thresholds of what constitutes a "contiguous area".