OCR: segmentation of small text - opencv

The problem
I've been building a (very) simple OCR engine.
Since I'm trying to classify very small (pixel size) characters, I'm having some difficulties on segmentation. Here's an example, after best-effort image-wide thresholding:
:
What I've tried
Error detection:
large horizontal size of the segments. It works, mostly, but fails (false positive)
for a few larger characters.
classify, and reject on low score. This seems a bit wasteful.
Error correction:
add pixels vertically (vertical histogram), find minimum. It cuts many segments on the wrong place, in many of the samples.
What I haven't tried yet
Trying to classify on all possible segmentation points (pixels). This would be very wasteful, and be difficult to expand for a 3-merged-characters segment.
I've been reading up on morphology approaches to turn the characters into mathematical curves, but I don't know really know where to start, or if it's worth the effort
Where to go from here?
I have no idea. Hence this question :)

Lean back and half close your eyes.
63 :-)
Now, if only it was so easy for a computer!
It's tantalisingly close to what double-patterning does (or un-does?) in silicon masks.
I would suggest oversampling (doubling or quadrupling the pixel count in each axis), filtering (probably low pass - or possibly bandpass where the passband = spatial frequency of a line), re-thresholding until they separate. Expensive, so only apply in problem areas.

Reinvent your problem so you do not need segmentation.
Really, for this scale I think you better invest in other approaches. For example, if you OCR on text (do you?) you can use the information of lines (character height). There are not many fonts that can be used for small (yet readable) characters. My approach would be a algorithm that scan lines in scanlines (from left to right, take pixels from top to bottom) and try to find correlations between trained text and scanlines (n, n-1... n-x)
And you probably need the information I the grayscale levels as well, so better not to threshold the images.

Related

Water Edge Detection

Is there a robust way to detect the water line, like the edge of a river in this image, in OpenCV?
(source: pequannockriver.org)
This task is challenging because a combination of techniques must be used. Furthermore, for each technique, the numerical parameters may only work correctly for a very narrow range. This means either a human expert must tune them by trial-and-error for each image, or that the technique must be executed many times with many different parameters, in order for the correct result to be selected.
The following outline is highly-specific to this sample image. It might not work with any other images.
One bit of advice: As usual, any multi-step image analysis should always begin with the most reliable step, and then proceed down to the less reliable steps. Whenever possible, the less reliable step should make use of the result of more-reliable steps to augment its own accuracy.
Detection of sky
Convert image to HSV colorspace, and find the cyan located at the upper-half of the image.
Keep this HSV image, becuase it could be handy for the next few steps as well.
Detection of shrubs
Run Canny edge detection on the grayscale version of image, with suitably chosen sigma and thresholds. This will pick up the branches on the shrubs, which would look like a bunch of noise. Meanwhile, the water surface would be relatively smooth.
Grayscale is used in this technique in order to reduce the influence of reflections on the water surface (the green and yellow reflections from the shrubs). There might be other colorspaces (or preprocessing techniques) more capable of removing that reflection.
Detection of water ripples from a lower elevation angle viewpoint
Firstly, mark off any image parts that are already classified as shrubs or sky. Since shrub detection would be more reliable than water detection, shrub detection's result should be used to inform the less-reliable water detection.
Observation
Because of the low elevation angle viewpoint, the water ripples appear horizontally elongated. In fact, every image feature appears stretched horizontally. This is called Anisotropy. We could make use of this tendency to detect them.
Note: I am not experienced in anisotropy detection. Perhaps you can get better ideas from other people.
Idea 1:
Use maximally-stable extremal regions (MSER) as a blob detector.
The Wikipedia introduction appears intimidating, but it is really related to connected-component algorithms. A naive implementation can be done similar to Dijkstra's algorithm.
Idea 2:
Notice that the image features are horizontally stretched, a simpler approach is to just sum up the absolute values of horizontal gradients and compare that to the sum of absolute values of vertical gradients.

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

how to find shapes that are slightly elongated oval / rectangle with curved corners / sometimes sector of a circle?

how to recognise a zebra crossing from top view using opencv?
in my previous question the problem is to find the curved zebra crossing using opencv.
now I thought that the following way would be much easier way to detect it,
(i) canny it
(ii) find the contours in it
(iii) find the black stripes in it, in my case it is slightly oval in shape
now my question is how to find that slightly oval shape??
look here for images of the crossing: www.shaastra.org/2013/media/events/70/Tab/422/Modern_Warfare_ps_v1.pdf
Generally speaking, I believe there are two approaches you can consider.
One approach is the more brute force image analysis approach, as you described. Here you are applying heuristics based on your knowledge of the problem in order to identify the pixels involved in the parts of the path. Note that 'brute force' here is not a bad thing, just an adjective.
An alternative approach is to apply pattern recognition techniques to find the regions of the image which have high probability of being part of the path. Here you would be transforming your image into (hopefully) meaningful features: lines, points, gradient (eg: Histogram of Oriented Gradients (HOG)), relative intensity (eg: Haar-like features) etc. and using machine learning techniques to figure out how these features describe the, say, the road vs the tunnel (in your example).
As you are asking about the former, I'm going to focus on that here. If you'd like to know more about the latter have a look around the Internet, StackOverflow, or post specific questions you have.
So, for the 'brute force image analysis' approach, your first step would probably be to preprocess the image as you need;
Consider color normalization if you are going to analyze color later, this will help make your algorithm robust to lighting differences in your garage vs the event studio. It'll also improve robustness to camera collaboration differences, though the organization hosting the competition provide specs for the camera they will use (don't ignore this bit of info).
Consider blurring the image to reduce noise if you're less interested in pixel by pixel values (eg edges) and more interested in larger structures (eg gradients).
Consider sharpening the image for the opposite reasons of blurring.
Do a bit of research on image preprocessing. It's definitely an explored topic but hardly 'solved' (whatever that would mean). There are lots of things to try at this stage and, of course, crap in => crap out.
After that you'll want to generate some 'features'..
As you mentioned, edges seem like an appropriate feature space for this problem. Don't forget that there are many other great edge detection algorithms out there other than Canny (see Prewitt, Sobel, etc.) After applying the edge detection algorithm, though, you still just have pixel data. To get to features you'll want probably want to extract lines from the edges. This is where the Hough transform space will come in handy.
(Also, as an idea, you can think about colorspace in concert with the edge detectors. By that, I mean, edge detectors usually work in the B&W colorspace, but rather than converting your image to grayscale you could convert it to an appropriate colorspace and just use a single channel. For example, if the game board is red and the lines on the crosswalk are blue, convert the image to HSV and grab the hue channel as input for the edge detector. You'll likely get better contrast between the regions than just grayscale. For bright vs. dull use the value channel, for yellow vs. blue use the Opponent colorspace, etc.)
You can also look at points. Algorithms such as the Harris corner detector or the Laplacian of Gaussian (LOG) will extract 'key points' (with a different definition for each algorithm but generally reproducible).
There are many other feature spaces to explore, don't stop here.
Now, this is where the brute force part comes in..
The first thing that comes to mind is parallel lines. Even in a curve, the edges of the lines are 'roughly' parallel. You could easily develop an algorithm to find the track in your game by finding lines which are each roughly parallel to their neighbors. Note that line detectors like the Hough transform are usually applied such that they find 'peaks', or overrepresented angles in the dataset. Thus, if you generate a Hough transform for the whole image, you'll be hard pressed to find any of the lines you want. Instead, you'll probably want to use a sliding window to examine each area individually.
Specifically speaking to the curved areas, you can use the Hough transform to detect circles and elipses quite easily. You could apply a heuristic like: two concentric semi-circles with a given difference in radius (~250 in your problem) would indicate a road.
If you're using points/corners you can try to come up with an algorithm to connect the corners of one line to the next. You can put a limit on the distance and degree in rotation from one corner to the next that will permit rounded turns but prohibit impossible paths. This could elucidate the edges of the road while being robust to turns.
You can probably start to see now why these hard coded algorithms start off simple but become tedious to tweak and often don't have great results. Furthermore, they tend to rigid and inapplicable to other, even similar, problems.
All that said.. you're talking about solving a problem that doesn't have an out of the box solution. Thinking about the solution is half the fun, and half the challenge. Everything I've described here is basic image analysis, computer vision, and problem solving. Start reading some papers on these topics and the ideas will come quickly. Good luck in the competition.

Object recognition and measuring size

I'd like to create a system for use in a factory to measure the size of the objects coming off the assembly line. The objects are slabs of stone, approximately rectangular, and I'd like the width and height. Each stone is photographed in the same position with a flash, so the conditions are pretty controlled. The tricky part is the stones sometimes have patterns on their surface (often marble with ripples and streaks) and they are sometimes almost black, blending in with the shadows.
I tried simply subtracting each image from a reference image of the background, but there are enough small changes in the lighting and the positions of rollers and small bits of machinery that the output is really noisy.
The approach I plan to try next is to use Canny's edge detection algorithm and then use some kind of numerical optimization (Nelder-Mead maybe) to match a 4-sided polygon to the edges. Before I home-brew something, though, is there an existing approach that works well in this kind of situation?
If it helps, it would be possible to 'seed' the algorithm with a patch of the image known to be within the slab (they're always lined up in the corner) to help identify its surface pattern and colors. I could also produce a training set of annotated images if necessary.
Some sample images of the background and some stone slabs:
Have you tried an existing image segmentation algorithm?
I would start with the maxflow algorithm for image segmentation by Vladimir Kolmogorov here: http://pub.ist.ac.at/~vnk/software.html
In the papers they fix areas of an image to belong to a particular segment, which would help for you problem, but it may not be obvious how to do this in the software.
Deep learning algorithms for parsing scenes by Richard Socher might also help: http://www.socher.org/
And Eric Sudderth has at least one interesting method for visual scene understanding here: http://www.cs.brown.edu/~sudderth/software.html
I also haven't actually used any of this software, which is mostly, if not all, for research and not particularly user friendly.

pattern recognition between two very different images

So, my problem is that I have to find common points between two images of a microchip. Here's an example of two images:
Between these two images, we can clearly see some common pattern like the wires on the bottom right of the first images that can be found in relatively the same place in the second image. Also, the sort of white Z shape in the first image can be seen in the second images, a bit harder, but it's there.
I tried to match them with SURF (OpenCV), found no common point at all. Tried to apply some filter on both images, like edge detection, thresholding, and other filter that I could found in GIMP, but whatever I tried, no common point were ever found.
I'd like to know if you have any idea to solve this problem ? My suggestion right now would be to manually match key features in both images with line segments, but preferably, it should be automated.
A solution that uses OpenCV would be preferable, but I'm looking for any suggestion possible. In OpenCV, all pattern matching situation that I saw were problems way more obvious that this one. No difference in color and so on.
Unless realtime is required, do a simple approach to test if rotation can be automated:
Circuit boards like the ones in the images, are often based on perpendicular straight line segments. Hence you can "despeckle" and remove stuff like coffee stains, by finding linesegments.
Think about creating a kernel, that have a line with dark pixels on one side, and bright pixels on the other. Fold it on the image (or cross-correlate it) to identify all pixels that have a sequence of bright/dark pixels which are nearly vertical or horizontal.
you may interlace to speed things up.
edges of stains and speckles may survive this, if you want angles close to 45* representatations!
The resulting image can be interpreted as a sparse pointcloud.
You can now use RANSAC or other similar approaches to describe many of the remaining correlations, as line segments.
* use a 2 point line segment as input model for RANSAC, Degrade if small.
* Determine infinite lines that have many inliers
* use growth or binninng approaches to segmentate lines.
benefits:
high likelyhood of line segment representations that are actually present as circuitry in image. 2 point description of segments, possible transforms are easy.
easy interpretation of data, as it can be overlayed in openCV
Rotation should be easily found as the rotation that matches most found lines to horizontal and/or vertical axis'es.
apply rotation.
repeat for both images.
now you can determine best translation between the images, by simple x,y cross correlation.
If the top image is always of that quality (quasi bilevel patterns, easy edge detection), I would try a good geometric matching algorithm (such as Cognex or Halcon), training with the top image and searching the bottom one.
Maybe it is worth to first compensate rotation (I hope there is no scaling). You would do that by determining the dominant edge direction, possibly using a Hough transform. Or, much better, by careful mechanical alignment of the sensors.
Anyway, chances of success are low, this is a difficult problem.

Resources