Character Segmentation fails on 'W' - opencv

I am using OpenCV for OCR of printed codes containing arbitrary characters and numbers. My pipeline involves thresholding, denoising, gradient morphology, and then finding the contours in order to palce a bounding box around each letter.
It works very well except when a 'W' appears in the code. It usually places 2 or 3 bounding boxes such that the prediction is "VAV" or "VV", which is honestly a mistake my own eyes might make when I'm tired.
Does anyone have any ideas how best to address this issue? I can be fairly certain that a 'W' appears in many of these codes and it needs to be segmented properly. Thanks for any help!

I found a solution that worked, at least in my case. I iteratively compute the relative overlap of neighboring bounding boxes, combining bounding boxes that have an overlap greater than a specified threshold (0.15 in my case). This works very effectively for my data.
Here is a cropping as an example:

Try running a morphological dilation on the image before finding the contours. This will help you merge all the parts of the letter W as a single blob, preventing it from being recognized as multiple letters.

Related

Recognition and counting of books from side using OpenCV

Just wish to receive some ideas on I can solve this problem.
For a clearer picture, here are examples of some of the image that we are looking at:
I have tried looking into thresholding it, like otsu, blobbing it, etc. However, I am still unable to segment out the books and count them properly. Hardcover is easy of course, as the cover clearly separates the books, but when it comes to softcover, I have not been able to successfully count the number of books.
Does anybody have any suggestions on what I can do? Any help will be greatly appreciated. Thanks.
I ran a sobel edge detector and used Hough transform to detect lines on the last image and it seemed to be working okay for me. You can then link the edges on the output of the sobel edge detector and then count the number of horizontal lines. Or, you can do the same on the output of the lines detected using Hough.
You can further narrow down the area of interest by converting the image into a binary image. The outputs of all of these operators can be seen in following figure ( I couldn't upload an image so had to host it here) http://www.pictureshoster.com/files/v34h8hvvv1no4x13ng6c.jpg
Refer to http://www.mathworks.com/help/images/analyzing-images.html#f11-12512 for some more useful examples on how to do edge, line and corner detection.
Hope this helps.
I think that #audiohead's recommendation is good but you should be careful when applying the Hough transform for images that will have the library's stamp as it might confuse it with another book (You can see that the letters form some break-lines that will be detected by sobel).
Consider to apply first an edge preserving smoothing algorithm such as a Bilateral Filter. When tuned correctly (setting of the Kernels) it can avoid these such of problems.
A Different Solution That Might Work (But can be slow)
Here is a different approach that is based on pixel marking strategy.
a) Based on some very dark threshold, mark all black pixels as visited.
b) While there are unvisited pixels: Pick the next unvisited pixel and apply a region-growing algorithm http://en.wikipedia.org/wiki/Region_growing while marking its pixels with a unique number. At this stage you will need to analyse the geometric shape that this region is forming. A good criteria to detecting a book is that the region is creating some form of a rectangle where width >> height. This will detect a book and mark all its pixels to the unique number.
Once there are no more unvisited pixels, the number of unique numbers is the number of books you will have + For each pixel on your image you will now to which book does it belongs.
Do you have to keep the books this way? If you can change the books to face back side to the camera then I think you can get more information about the different colors used by different books.The lines by Hough transform or edge detection will be more prominent this way.
There exist more sophisticated methods which are much better in contour detection and segmentation, you can have a look at them here, however it is quite slow, http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
Once you get the ultrametric contour map, you can perform some computation on them to count the number of books
I would try a completely different approach; with paperbacks, the covers are medium-dark lines whilst the rest of the (assuming white pages) are fairly white and "bloomed", so I'd try to thicken up the dark edges to make them easy to detect, then that would give the edges akin to working with hardbacks which you say you've done.
I'd try something like an erosion to thicken up the edges. This would be a nice, fast operation.

How to get position (x,y) and number of particular objects or shape in a handdrawing image?

first, I've learning just couple of week about image processing, NN, dll, by myself, so I'm really new n really far to pro. n sorry for my bad english.
there's image or photo of my drawing, I want to get the coordinates of object/shape (black dot) n the number around it, the number indicating the sequence number of dot.
How to get it? How to detect the dots? Shape recognition for the dots? Number handwriting recognition for the numbers? Then segmentation to get the position? Or use template matching? But every dot has a bit different shape because of hand drawing. Use neural network? in NN, the neuron is usually contain every pixel to recognize an character, right? can I use an picture of character or drawing dot contained by each neuron to recognize my whole picture?
I'm very new, so I'm really need your advice, correct me if I wrong! Please tell me what I must learn, what I must do, what I must use.
Thank you very much. :'D
This is a difficult problem which can't be solved by a quick solution.
Here is how I would approach it:
Get a better picture. Your image is very noisy and is taken in low light with high ISO. Use a better camera and better lighting conditions so you can get the background to be as white as possible and the dots as black as possible. Try to maximize the contrast.
Threshold the image so that all the background is white and the dots and numbers are black. Maybe you could apply some erosion and/or dilation to help connect the dark edges together.
Detect the rectangle somehow and set your work area to be inside the rectangle (crop the rest of the image so that you are left with the area inside the rectangle). You could do this by detecting the contours in the image and then the contour that has the largest area is the rectangle (because it's the largest object in the image). Of course, this is not the only way. See this: OpenCV find contours
Once you are left with only the dots, circles and numbers you need to find a way to detect them and discriminate between them. You could again find all contours (or maybe you've found them all from the previous step). You need to figure out a way to see if a certain contour is a circle, a filled circle (dot) or a number. This is a problem in it's own. Maybe you could count the white/black pixels in the contour's bounding box. Dots have more black pixels than circles and numbers. You also need to do something about numbers that connect with dots (like the number 5 in your image)
Once you know what is a dot, circle or number you could use an OCR library (Tesseract or any other OCR lib) to try and recognize the numbers. You could also use a neural network library (maybe trained with the MNIST dataset) to recognize the digits. A good one would be a convolutional neural network similar to LeNet-5.
As you can see, this is a problem that requires many different steps to solve, and many different components are involved. The steps I suggested might not be the best, but with some work I think it can be solved.

OCR detection with openCV

I'm trying to create a simpler OCR enginge by using openCV. I have this image: https://dl.dropbox.com/u/63179/opencv/test-image.png
I have saved all possible characters as images and trying to detect this images in input image.
From here I need to identify the code. I have been trying matchTemplate and FAST detection. Both seem to fail (or more likely: I'm doing something wrong).
When I used the matchTemplate method I found the edges of both the input image and the reference images using Sobel. This provide a working result but the accuracy is not good enough.
When using the FAST method it seems like I cant get any interresting descriptions from the cvExtractSURF method.
Any recomendations on the best way to be able to read this kind of code?
UPDATE 1 (2012-03-20)
I have had some progress. I'm trying to find the bounding rects of the characters but the matrix font is killing me. See the samples below:
My font: https://dl.dropbox.com/u/63179/opencv/IMG_0873.PNG
My font filled in: https://dl.dropbox.com/u/63179/opencv/IMG_0875.PNG
Other font: https://dl.dropbox.com/u/63179/opencv/IMG_0874.PNG
As seen in the samples I find the bounding rects for a less complex font and if I can fill in the space between the dots in my font it also works. Is there a way to achieve this with opencv? If I can find the bounding box of each character it would be much more simple to recognize the character.
Any ideas?
Update 2 (2013-03-21)
Ok, I had some luck with finding the bounding boxes. See image:
https://dl.dropbox.com/u/63179/opencv/IMG_0891.PNG
I'm not sure where to go from here. I tried to use matchTemplate template but I guess that is not a good option in this case? I guess that is better when searching for the exact match in a bigger picture?
I tried to use surf but when I try to extract the descriptors with cvExtractSURF for each bounding box I get 0 descriptors... Any ideas?
What method would be most appropriate to use to be able to match the bounding box against a reference image?
You're going the hard way with FASt+SURF, because they were not designed for this task.
In particular, FAST detects corner-like features that are ubiquituous iin structure-from-motion but far less present in OCR.
Two suggestions:
maybe build a feature vector from the number and locations of FAST keypoints, I think that oyu can rapidly check if these features are dsicriminant enough, and if yes train a classifier from that
(the one I would choose myself) partition your image samples into smaller squares. Compute only the decsriptor of SURF for each square and concatenate all of them to form the feature vector for a given sample. Then train a classifier with these feature vectors.
Note that option 2 works with any descriptor that you can find in OpenCV (SIFT, SURF, FREAK...).
Answer to update 1
Here is a little trick that senior people taught me when I started.
On your image with the dots, you can project your binarized data to the horizontal and vertical axes.
By searching for holes (disconnections) in the projected patterns, you are likely to recover almost all the boudnig boxes in your example.
Answer to update 2
At this point, you're back the my initial answer: SURF will be of no good here.
Instead, a standard way is to binarize each bounding box (to 0 - 1 depending on background/letter), normalize the bounding boxes to a standard size, and train a classifier from here.
There are several tutorials and blog posts on the web about how to do digit recognition using neural networks or SVM's, you just have to replace digits by your letters.
Your work is almost done! Training and using a classifier is tedious but straightforward.

OpenCV: comparing simple images with small difference

I have a bunch of "simple" images and I want to compare if they are similar together. I compare them to each other using template matching (cv::matchTemplate) and results are quite good.
Now I want to fine tune my program and I face a problem. For example I have two images which look very much alike. Only differences they have is that another one has thicker line and the digit front of item is different. When both images are small, one pixell difference in line thickness makes big result differences when doing template matching. When line thicknesses are same and only difference is the front digit, I get template matching result something like 0.98 with CV_TM_CCORR_NORMED when match successful. When line thickness is different matching result is something like 0.95.
I cannot decrease my threshold value below 0.98 because some other similar images have same line thickness.
Here are example images:
So what options do I have?
I have tried:
dilate the original and template
erode also both
morphologyEx both
calculating keypoints and comparing them
finding corners
But no big success yet. Are those images too simple that detecting "good features" is hard?
Any help is very wellcome.
Thank you!
EDIT:
Here are some other example images. What my program consider as similar are put in same zip-folder.
ZIP
A possible way might be thinning the two images, so that every line is of one pixel width, since the differing thickness is causing you the main problem with similarity.
The procedure would be to first binarize/threshold the images, then apply a thinning operation on both images, so both are now having the same thickness of 1 px. Then use the usual template matching that you used before with good results.
In case you'd like more details on the thinning/skeletonization of binary images here are a few OpenCV implementations posted on various discussion forums and OpenCV groups:
OpenCV code for thinning (Guo and Hall algo, works with CvMat inputs)
The JR Parker implementation using OpenCV
Possibly more efficient code here (uses OpenCV optimized access methods a lot, however most of the page is in Japanese!)
And lastly a brief overview of thinning in case you're interested.
You need something more elementary here, there isn't much reason to go for fancy methods. Your figures are already binary ones, and their shapes are very similar overall.
One initial idea: consider the upper points and bottom points in a certain image and form a upper hull and a bottom hull (simply a hull, not a convex hull or anything else). A point is said to be an upper point (respec. bottom point) if, given a column i, it is the first point starting at the top (bottom) of the image that is not a background point in i. Also, your image is mostly one single connected component (in some cases there are vertical bars separated, but that is fine), so you can discard small components easily. This step is important for your situation because I saw there are some figures with some form of noise that is irrelevant to the rest of the image. Considering that a connected component with less than 100 points is small, these are the hulls you get for the respective images included in the question:
The blue line is indicating the upper hull, the green line the bottom hull. If it is not apparent, when we consider the regional maxima and regional minima of these hulls we obtain the same amount in both of them. Furthermore, they are all very close except for some displacement in the y axis. If we consider the mean x position of the extrema and plot the lines of both images together we get the following figure. In this case, the lines in blue and green are for the second image, and the lines in red and cyan for the first. Red dots are in the mean x coordinate of some regional minima, and blue dots the same but for regional maxima (these are our points of interest). (The following image has been resized for better visualization)
As you can see, you get many nearly overlapping points without doing anything. If we do even less, i.e. not even care about this overlapping, and proceed to classify your images in the trivial way: if an image a and another image b have the same amount of regional maxima in the upper hull, the same amount of regional minima in the upper hull, the same amount of regional maxima in the bottom hull, and the same amount of regional minima in the bottom hull, then a and b belong to the same class. Doing this for all your images, all images are correctly grouped except for the following situation:
In this case we have only 3 maxima and 3 minima for the upper hull in the first image, while there are 4 maxima and 4 minima for the second. Following you see the plots for the hulls and points of interest obtained:
As you can notice, in the second upper hull there are two extrema very close. Smoothing this curve eliminates both extrema, making the images match by the trivial method. Also, note that if you draw a rectangle around your images, then this method will tell they are all equal. In that case you will want to compare multiple hulls, discarding the points in the current hull and constructing other ones. Nevertheless, this method is able to group all your images correctly given they are all very simple and mostly noisy-free.
From as much as I can get, the difficulty is when the shape is the same, just size is different. A simple hack approach could be:
- subtract the images, then erode. If the shapes were the same but one slightly bigger, subtracting will leave only the edges, which will be thin an vanish with erosion as noise.
Somewhat more formal, would be to take the contours and then the approximate polygons and do a invariants comparison (Hu Moments etc.)

Image processing / super light OCR

I have 55 000 image files (in both JPG and TIFF format) which are pictures from a book.
The structure of each page is this:
some text
--- (horizontal line) ---
a number
some text
--- (horizontal line) ---
another number
some text
There can be from zero to 4 horizontal lines on any given page.
I need to find what the number is, just below the horizontal line.
BUT, numbers strictly follow each other, starting at one on page one, so in order to find the number, I don't need to read it: I could just detect the presence of horizontal lines, which should be both easier and safer than trying to OCR the page to detect the numbers.
The algorithm would be, basically:
for each image
count horizontal lines
print image name, number of horizontal lines
next image
The question is: what would be the best image library/language to do the "count horizontal lines" part?
Probably the easiest way to detect your lines is using the Hough transform in OpenCV (which has wrappers for many languages).
The OpenCV Hough tranform will detect all lines in the image and return their angles and start/stop coordinates. You should only keep the ones whose angles are close to horizontal and of adequate length.
O'Reilly's Learning OpenCV explains in detail the function's input and output (p.156).
If you have good contrast, try running connected components and analyze the result. It can be an alternative to finding lines through Hough and cover the case when your structured elements are a bit curved or a line algorithm picks up the lines you don’t want it to pick up.
Connected components is a super fast, two raster scan algorithm and will give you a mask with all you connected elements in it marked with different labels and accounted for. You can discard anything short ( in terms of aspect ratio). Overall, this can be more general, faster but probably a bit more involved than running Hough transform. The Hough transform on the other hand will be more tolerable for contrast artifacts and even accidental gaps in lines.
OpenCV has the function findContours() that find components for you.
you might want to try John' Resig's OCR and Neural Nets in Javascript

Resources