I'm doing a DIP project. I want to count the total number of words in each paper using Image Processing.
The original image is:
I did some pre-processing and produced the image below:
My idea to count the total number of words in each paper is to detect the digits inside blobs.
So please guide me. how can I count the words in this image? What's your idea?
Thanks.
Using the Digits inside blobs/circles is a good problem definition. I would recommend doing a circle hough transform and only looking for circles of a certain radius and then count the number of circles detected. You'll have to figure out what your radius is in pixels but this might be a good starting point. Good luck
If all pages are somewhat cleanly separated with one definition per line, you could take a very simple approach of counting the filled lines. First detect the list on the page to ignore irrelevant markings (green box) - does not have to exactly detect the edge so long as the bounds are no bigger than the list.
Then look for horizontal lines of pixels with no marking on them, or no dark value greater than X darkness. This is illustrated below with the pink horizontal lines. Lastly count the filled lines (any discrete section of horizontal lines which is not empty) and you have your number of definitions.
Related
I have read about the U-Matrix in many places including this site. The best explanation of a U-Matrix is found here in this site with the explanation why there is so little correct information (the original paper is not at all useful) about how the U-Matrix is properly calculated.
The answer to the above question completely explains the concept for a hexagonal map. But the logic of calculating the U-matrix in the answer to the linked question do not hold when the map is rectangular.
For example consider 3 x 3 rectangular lattice as shown below.
Using the above lattice I can calculate the U-Matrix as shown below.
The yellow colored squares are the distances between the blue colored squares. I'm certain about the yellow colored squares. I'm also certain about the blue colored squares since we only need to take the average or the median of its surrounding.
So my question is : How to calculate the red squares?
I found a few sources including the ones mentioned in the previous question I have cited above. The best explanations I got for a rectangular U-Matrix is the following
Description 1 -> In this paper the authors does not completely explain how to calculate the red squares. Just explains the average of the surrounding needs to be taken. Which is not clear and in my opinion not proper (see below)
Description 2 -> In this paper the authors have clearly stated how to calculate the red squares but the logic they have presented seem flawed.
My explanation why the above might not be proper
If one takes the average of its surrounding to calculate the red squares as mentioned by description 1 the calculation of blue squares would be directly affected. For example consider calculating the value of blue square number 1 in U-Matrix. If we are to take the average of its surrounding we need distances (1,2) , (1,4) and (1,5). If we fill the corresponding red square with (1,5) the calculation for the blue square 4 is wrong since we did not calculate (2,4) and the same red square should be the place to have it. So the equation of dividing the addition of (1,5) and (2,4) by 2*(1.414...) will not work since there is a component which does not belong to the average. In the case of blue square 1 the distance portion of (2,4) do not belong there.
I programmed using the description in the second paper and the U-Matrix generated for a simple data set is not satisfactory. Whilst the average of distances around a given node performs better than the U-Matrix for the same data set as given below. (The images are U-Matrix followed by the average)
I didn't read the papers you mentioned and I have been working mostly with hexagonal maps, but it seems the most reasonable solution is to caclulate the red squares as averages of the yellow squares since these are their neighbors. When you use rectangular maps there are no diagonal connections since if they were then it would be more like a hexagonal map. So yellow squares are the ones you shound take into account. Think of the red squares as "fake" map units that fill in the gaps created by the interpolation of the nodes made in the U matrix.
By the way, hexagonal maps are considered better in capturing the topology of the underling dataset.
I really appreciate the question. I agree with the answer that thinking the red squares as "fake" map units and hence assigning the average values of yellow squares is a good solution. Further and simpler, we create a distance map with the identical size of the training grid and then we assign the average values of a square's neighbours to the square. I found this is the solution adopted by the minisom. See the following get_distance() function __doc__ for convenience.
def distance_map(self):
"""Returns the distance map of the weights.
Each cell is the normalised sum of the distances between
a neuron and its neighbours."""
um = zeros((self._weights.shape[0], self._weights.shape[1]))
I'm a bachelor student and currently working on a final project in Optical Braille Recognition using a real-time camera.
I've successfully processed the image into HSV format and extracted only the value of the HSV image to prevent the ambient light effecting the image, performed the binary threshold,canny edge detection, erode, and dilation for getting only the Braille dots from the camera.
What I would like to ask is how to perform a segmentation in a problem where the distance between each dots always change as the camera move nearer or further to the Braille writings?
Any assistance would be appreciated.
Thank you
To do this, you would detect some sort of relative pair of coordinates that would allow you to detect the "scale" of the braille writing in your image. This can be an identifying pair of points on either end of the writing, or even just some characteristic dots. With the scale you can transform the image to be of uniform size, depending on what distance away the camera is.
There is no simple, general solution to your problem. Surely if I do not immediately understand how these Braille letters are spaced out, it will not be easily solved by a simple algorithm.
Your best bet is to read literature on Braille text, talk with your prof, and have a blind person explain to you how they read Braille.
Other than that, you would have to find the baselines of the Braille text lines and see how they differ, then run a cvPerspectiveTransform in order to straighten out the image, so you can segment the dots without considerations for perspective.
This challenge is very similar to the issues I've encountered in my barcode system. My answer is a generalized description of the method I use.
I'd start by dividing the image into a grid, where a single character cell would fit within a single grid cell. This would make it that any character would be guaranteed to fit within a 2x2 grid cell, no matter how the grid overlays the image.
Convert the image into dots. Dots are identified by local identification using a small area of pixels.
assign each dot a grid cell number. this should be something easy like x/y location divided by 32 pixels cell ((y/32)*(width/32))+(x/32)
Keep a count of dots per grid cell and when all the dots are identified, sort the dot table by grid number and build an index by displacement in the table and number of elements.
If the resolution varies, sample some cells with lots of dots to determine distance between cell pairs.
Look though the cells row by row, but examine each cell using a 2x2 cell group. This way, any dot in the cell being tested, is guaranteed to be matched to a paired dot (if one exists). By using the grid dots only need to be matched to dots local to each other, so while the image may have thousands of dots, individual dots only need to try to match to 1-10 dots.
Pairing dots will create duplicates, which can either be prevented while matching or purged later.
At this point is where you would need to match the dots to Braille. Horizontal pairs of pairs and vertical pairs of pairs should be able to start lining up the Braille text.
Once the lines are aligned, the speck table would then be rotated into the text alignment determined. The pairs would be put into alignement, then from the position of the pair, unmatched specks could be added by matching the grid location of the pair to unpaired dots in the dot table.
I have a bunch of "simple" images and I want to compare if they are similar together. I compare them to each other using template matching (cv::matchTemplate) and results are quite good.
Now I want to fine tune my program and I face a problem. For example I have two images which look very much alike. Only differences they have is that another one has thicker line and the digit front of item is different. When both images are small, one pixell difference in line thickness makes big result differences when doing template matching. When line thicknesses are same and only difference is the front digit, I get template matching result something like 0.98 with CV_TM_CCORR_NORMED when match successful. When line thickness is different matching result is something like 0.95.
I cannot decrease my threshold value below 0.98 because some other similar images have same line thickness.
Here are example images:
So what options do I have?
I have tried:
dilate the original and template
erode also both
morphologyEx both
calculating keypoints and comparing them
finding corners
But no big success yet. Are those images too simple that detecting "good features" is hard?
Any help is very wellcome.
Thank you!
EDIT:
Here are some other example images. What my program consider as similar are put in same zip-folder.
ZIP
A possible way might be thinning the two images, so that every line is of one pixel width, since the differing thickness is causing you the main problem with similarity.
The procedure would be to first binarize/threshold the images, then apply a thinning operation on both images, so both are now having the same thickness of 1 px. Then use the usual template matching that you used before with good results.
In case you'd like more details on the thinning/skeletonization of binary images here are a few OpenCV implementations posted on various discussion forums and OpenCV groups:
OpenCV code for thinning (Guo and Hall algo, works with CvMat inputs)
The JR Parker implementation using OpenCV
Possibly more efficient code here (uses OpenCV optimized access methods a lot, however most of the page is in Japanese!)
And lastly a brief overview of thinning in case you're interested.
You need something more elementary here, there isn't much reason to go for fancy methods. Your figures are already binary ones, and their shapes are very similar overall.
One initial idea: consider the upper points and bottom points in a certain image and form a upper hull and a bottom hull (simply a hull, not a convex hull or anything else). A point is said to be an upper point (respec. bottom point) if, given a column i, it is the first point starting at the top (bottom) of the image that is not a background point in i. Also, your image is mostly one single connected component (in some cases there are vertical bars separated, but that is fine), so you can discard small components easily. This step is important for your situation because I saw there are some figures with some form of noise that is irrelevant to the rest of the image. Considering that a connected component with less than 100 points is small, these are the hulls you get for the respective images included in the question:
The blue line is indicating the upper hull, the green line the bottom hull. If it is not apparent, when we consider the regional maxima and regional minima of these hulls we obtain the same amount in both of them. Furthermore, they are all very close except for some displacement in the y axis. If we consider the mean x position of the extrema and plot the lines of both images together we get the following figure. In this case, the lines in blue and green are for the second image, and the lines in red and cyan for the first. Red dots are in the mean x coordinate of some regional minima, and blue dots the same but for regional maxima (these are our points of interest). (The following image has been resized for better visualization)
As you can see, you get many nearly overlapping points without doing anything. If we do even less, i.e. not even care about this overlapping, and proceed to classify your images in the trivial way: if an image a and another image b have the same amount of regional maxima in the upper hull, the same amount of regional minima in the upper hull, the same amount of regional maxima in the bottom hull, and the same amount of regional minima in the bottom hull, then a and b belong to the same class. Doing this for all your images, all images are correctly grouped except for the following situation:
In this case we have only 3 maxima and 3 minima for the upper hull in the first image, while there are 4 maxima and 4 minima for the second. Following you see the plots for the hulls and points of interest obtained:
As you can notice, in the second upper hull there are two extrema very close. Smoothing this curve eliminates both extrema, making the images match by the trivial method. Also, note that if you draw a rectangle around your images, then this method will tell they are all equal. In that case you will want to compare multiple hulls, discarding the points in the current hull and constructing other ones. Nevertheless, this method is able to group all your images correctly given they are all very simple and mostly noisy-free.
From as much as I can get, the difficulty is when the shape is the same, just size is different. A simple hack approach could be:
- subtract the images, then erode. If the shapes were the same but one slightly bigger, subtracting will leave only the edges, which will be thin an vanish with erosion as noise.
Somewhat more formal, would be to take the contours and then the approximate polygons and do a invariants comparison (Hu Moments etc.)
First of all, I very much appreciate the help provided by the experts here at SO. The questions posed by many and answered by the experts has been of immense benefit to me. It had helped me with a very crucial problem few months back when I was a student doing my thesis.
Right now I am working on a problem to detect (and then recognize) numbers in a complex scene image. You can check out these images here: http://imageshack.us/g/823/dsc1757w.jpg/. These are pictures of marathon runners with their numbers on the front of their shirts. I have to detect all the numbers that appear in the image and then recognize them. The recognition wont be difficult as these appear to be OCR friendly characters. The crucial thing is how to detect these numbers.
I had an idea to first color filter it for black color. But when I tried in Matlab, the results were not encouraging, as we can see that many of the regions in the image qualify this criteria (the clothes, some shadows behind the runners, the shadows in the foliage, etc). Either I need to classify these characters from these other regions or need some other good technique.
There are papers available and I have gone through some of them, like the SWT, DWT, etc., but I have a feeling they wont be of much help. I was thinking some kind of training algorithm might be useful. There is another reason for this, in future there might be other photos with possibly different fonts, etc., so I think a dedicated algorithmic approach might fail. Can anyone point me in the right direction?
I am not a novice in image processing, but not an expert either. So, any and all help/suggestion in this regard will be greatly appreciated :) .
Thanks,
MD
You know that your problem is not a simple one, but it seems very interesting!
Although I don't have any solutions for you, I will just share my thoughts in hope that you can make something out of it.
Let's take 2 of your photos as examples:
Photo-A: http://imageshack.us/photo/my-images/59/dsc0275a.jpg/
It shows a single person with a relative "big" green label with numbers in his shirt.
Photo-B: http://imageshack.us/photo/my-images/546/dsc0243u.jpg/
It shows a lot of people with red smaller labels in their shirts.
(The labels' height in pixels is about 1/5 of the label in Photo-A)
Considering the above photos, I will try to write some random thoughts which may help...
(a) Define your scale: There is no point to apply a search algorithm to find labels from 2x2 pixels up-to the full image resolution. You must define the minimum/maximum limits for width & height of a label. Those limits may depend on many different factors:
(1) One factor is the real size of labels (defined by the distance of people from camera) which can be defined as a percentage of the image width & height.
(2) Another factor is the actual reading accurracy of the OCR you are going to use. If the numbers' image height is smaller than Y1 pixels or bigger than Y2 pixels the OCR will not be able to read it (it sounds strange but it's true: big images may seem very clear to the human eye, but an OCR may have problems reading it).
(b) Find the area(s) of interest: In your case, this is equivalent to "Find the approximate position of labels". We can define an athlete label roughly as "An (almost) rectangular area, which may be a bit inclined relative to photo borders, and contains: A central area of black + color C1 [e.g. red or green] + a white (=neutral) area on top and/or bottom of it".
A possible algorithm to find the approximate position of a label is:
(1) Traverse all image left-to-right, top-to-bottom and examine a square area of MinHeight/2 x MinHeight/2
(2) Create the histogram of the square area (or posterize it e.g. to 8 levels) and try to find if there is only Black + Another color C1 in a percentage of e.g. Black: 40% +/- 10, Color: 60% +/- 10%
(3) If (2) is true try to expand the area to Right and Bottom while the percentages are kept in the specified limits
(4) If the square is fully expanded, check if the expanded area size is inside the min/max limits of width/height you specified in (a). If not, go to step 1
(5) Process the expanded area to read the numbers - see (c) bellow
(6) Goto to step 1
(c) Process the area(s) of interest: Try the following steps:
(1) Convert each image-area to Grayscale by applying a color filter that burn Color C1 to white.
(2) Equalize the Grayscale to make the black letters stand-out
(3) If an inclination has been detected, perform a reverse rotation on the image-area to make the letters as horizontal as possible.
(4) Feed the area to an OCR trained only for numbers
Good luck with your project!
You could try to contact the author of this software:
Yaroslav is an active member of StackOverflow.
I have 55 000 image files (in both JPG and TIFF format) which are pictures from a book.
The structure of each page is this:
some text
--- (horizontal line) ---
a number
some text
--- (horizontal line) ---
another number
some text
There can be from zero to 4 horizontal lines on any given page.
I need to find what the number is, just below the horizontal line.
BUT, numbers strictly follow each other, starting at one on page one, so in order to find the number, I don't need to read it: I could just detect the presence of horizontal lines, which should be both easier and safer than trying to OCR the page to detect the numbers.
The algorithm would be, basically:
for each image
count horizontal lines
print image name, number of horizontal lines
next image
The question is: what would be the best image library/language to do the "count horizontal lines" part?
Probably the easiest way to detect your lines is using the Hough transform in OpenCV (which has wrappers for many languages).
The OpenCV Hough tranform will detect all lines in the image and return their angles and start/stop coordinates. You should only keep the ones whose angles are close to horizontal and of adequate length.
O'Reilly's Learning OpenCV explains in detail the function's input and output (p.156).
If you have good contrast, try running connected components and analyze the result. It can be an alternative to finding lines through Hough and cover the case when your structured elements are a bit curved or a line algorithm picks up the lines you don’t want it to pick up.
Connected components is a super fast, two raster scan algorithm and will give you a mask with all you connected elements in it marked with different labels and accounted for. You can discard anything short ( in terms of aspect ratio). Overall, this can be more general, faster but probably a bit more involved than running Hough transform. The Hough transform on the other hand will be more tolerable for contrast artifacts and even accidental gaps in lines.
OpenCV has the function findContours() that find components for you.
you might want to try John' Resig's OCR and Neural Nets in Javascript