Rotation Invariant Number Recognition - opencv

I have videos of tagged bees I would like to track. I can get the tag coordinates and the tag color, but I can not reliably get the numbers on the tags.
I can extract a tag and get an image like this:
But still I have trouble recognizing the number. I am using Python and OpenCV. I have tried Tesseract, but haven't had any success. The rotation of the tags are arbitrary, which is a major problem. Also, I am not sure if it is possible to distinguish 66 from 99 by looking at the tag only.
So, what is the best way to get the numbers on the tags?

OpenCV function minAreaRect() fits a minimum area rectangle to the digits.
With the assumption of digit height is always greater than digit width, and the minimum fitting rectangle is always along the axis of the digits, I have obtained a rotation value. This seems to do the trick in most cases.

Related

How can you measure the length of curved grid lines on an image?

Suppose you have an image like this:
How can you measure the combined length of all the lines in this image?
I have tried (naively) skeletonising the image and then counting the number of pixels. However, this gives inaccurate results, as diagonal steps are actually longer than vertical/horizontal ones.
My other idea is to generate a chain code for all the line segments , and then use something like Freeman's method to measure the length from the chain code. However, generating the chain code is going to be quite tricky I think, as usually they start/stop at the same point, and this won't work for the grid shape.
Am I missing something obvious here? Is there an easier way to do this?
As far as I can see, the strokes are 3 pixels wide. So dividing the number of black pixels by three isn't a too bad approximation.
Alternatively, use a thinning algorithm to reduce the width to a single pixel (connexity 8), then seed-fill the whole outline. You will use a simple recursive 8-ways fill, and count the lateral and diagonal moves separately. In the end the length is given by L + D√2.

How to group letters in OpenCV knowing their RotatedRects?

I have an image with letters, for example like this:
It's a binary image obtained from previous image processing stages and I know boundingRect and RotatedRect of every letter, but these letters are not grouped in words yet. It is worth mentioning, that RotatedRect can be returned from minAreaRect() or fitEllipse(), what is shown here and here. In my case RotatedRects look like this:
Blue rectangles are obtained from minAreaRect and red are obtained from fitEllipse. They give a little different boxes (center, width, height, angle), but the biggest difference is in values of angle. In first option angle changes from -90 to 0 degrees , in second case angle changes from 0 to 180 degrees. My problem is: how to group these letters in words, basing on parameters of RotatedRects? I can check angle of every RotatedRect and also measure distance between centers of every two RotatedRects. With simple assumptions on direction of text and distance between letters my algorithm of grouping works. But in more complicated case I encounter a problem. For example, in the image below there are few groups of text, with different directions, different angles and distances between letters.
Problems are when letter from one word is close to letter from other word and when angle of RotatedRect inside given word is more different than the angles of its neighbours. What could be the best way to connect letters in right words?
First, you need to define metric. It may be Euclidian 3D distance for example, defined as ||delta_X,delta_y,Delta_angle|| , where delta_X and delta_Y are distances beetween rectangle centers along x and y coordinate, and Delta_angle as a distance between angular orientation.
In short, your rectangles transforms to 3D data points, with coordinates (x,y,angle).
After you define this. You can use clusetering algorithm on your data. Seems DBSCAN should work good here. Check this article for example: link it may help to choose clustering algorithm.
I extended the aforementioned metric by a few other elements related to geometric properties of letters and words (distances, angles, areas, a ratio of neighboring letters areas, etc.) and now it works fine. Thanks for the suggestion.

Build blocks and isolate characters OpenCV

I have been searching for an answer for a while to this question but cannot find anything useful.
I am trying to read machine readable zone with a camera. I need to extract characters one by one from machine readable zone and feed to OCR. I tried to threshold image, to find contours, extract characters one by one but while it is on live camera find contours miss some characters and I get results not as I expected.
While machine readable zone is known size, form, is there a proper method to build blocks for each character and extract them?
UPDATE CODE
rect = []
blur = cv2.medianBlur(roi_gray,3) #roi_gray is aligned horizontally MRZ zone
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,2)
_,contours, hierarchy = cv2.findContours(thresh.copy(),cv2.RETR_CCOMP,cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse = True)[:90]
minH = 20
minW = 20
for ctr in contours:
if cv2.contourArea(ctr) < 1000:
xyc,wh,a = cv2.minAreaRect(ctr)
w,h = wh
x,y = xyc
if h >= minH or w >= minW:
rect.append(cv2.boundingRect(cv2.approxPolyDP(ctr,3,True)))
rect is containing collected contours but problem is that after thresholding as example character N is splitting into two contours, or it was not found by findContours so letter is missing in finally output.
Video
I have found video there seems author build blocks for each character but unfortunately author does not provide any additional information about method or code. Video link
To me that ID text of interest area has an aspect ratio, maybe the block means that text area. Having an aspect ration (-+ an error) it may be a possibility to remove other text areas. In OpenCV 3 there is a detector for text.
More, I suppose the area detected is tracked, at least it seems so in the video.
IMHO that app is doing a blur, then a binarization then a erode-dilate to detect text lines. So, after a wrap correction (or maybe even a little perspective correction), with a vertical projection you can detect the character width, so you can detect each character and feed it to the OCR.
According to the comment, I add the info for the character area. I would do an opening operation for filling white spaces inside the letters, or linking the contours. Then, by simply vertical sum the pixels values, you'll get a vertical projection. now you have some minimums between the characters. Using those minimums you can get a character width by averaging the distances between them.
What you can also do is not processing on each frame this width, but getting a width that vary not too much over consecutive frames. You can achieve this by doing an average over widths in the last 5 frames (using a queue).
Try it and come back with some results, like this we will be able to help you more.
There is an OpenCV forum, too, there you'll probably find more informations

Detect degree of rotation of an image

I am doing a project in opencv to detect handwritten characters from a user filled form. I have made algorithm to detect the skew angle of the scanned image using Hough Line Transform. But it does not work when the image is 180 degree rotated since 0 and 180 degree are treated as same by Hough Line function. My image contains some rectangles to fill data in them and some text. So how do i detect if a scanned image is 180 degree rotated or not?
Since I will have to first correct the skew angle of the image then only I can detect exactly where on the image user filled data (which I need to extract) lies using rectangle coordinates from the empty template form provided earlier, answers without using chacater recognition are appreciated.
To lift the 180° degrees ambiguity, only OCR can tell you: perform two reads on the deskewed text, one using the given angle, the other one using the angle + 180°, and keep the most successful read.
Unless you have some a priori information it's the only way, as other image processing operations don't know about characters.
UPDATE:
Some strings are forever ambiguous, like 0689HINOSXZ <=> ZXSONIH6890.
If the layout of the text is known (boxes) and asymmetric, it is a relatively easy matter to check matching of the text strings to the layout: choose a box (such as the topmost) and a string (the topmost), and align them by translation; then see how the other boxes and strings match (using a nearest neighbor rule) and establish the correspondences. Compare results with the straight and flipped layout, and keep the best overall area of overlap.
For reliability, it can be better to try more than a starting box/string pair, as there can be some ambiguity to which is the topmost (it could even be missing).
Isn't your problem more general? Let's say, you detect a skew angle of +45 degrees and rotate the image by -45 degrees. Then it could still be that the image is rotated by 180 degrees because it was not rotated +45 degrees but -135 instead.
Anyway, to the actual question: I am not an expert in character recognition but I think if you use it anyway in your application, couldn't you just try character recognition for both rotations and then choose the one that gets stronger response?
If you match the rectangles in your template with those of the skew corrected image, you'll be able to get the correct orientation (but only if there's no symmetry in the placement of those rectangles). For matching you may be able to use the rectangles in your template as a mask to extract regions from skew corrected image.
EDIT
Suppose your template and the skew corrected image look like this (in the best case where there are no displacements in skew corrected) :
Then you can use the template as a mask to copy data from skew corrected image. Then check what fraction of the white pixels in the template is contained in the copied image. This value will be very low for a 180 degree rotated image.
But as you say, this won't work in practice because of the displacements. Then may be you can try template matching (cross correlation) in which you use the template image as the template. Location of the strongest peak and the strength would give you some indication of the orientation. You can perform template matching at a reduced resolution so it runs faster.
You could try to match keypoints (Harris, Sift, ...) from the scanned image and the empty template. With the matched points you can easily find a transformation to align the scanned image with the template. This may work for your case, but you are more likely to succeed if the are some textured logos in the images, as it's usually the case for forms.
Can't you simple compute two cross-correlations? One with 180 rotation and one without? The one with the matching rectangle should give you a higher correlation maximum (provided the image contrast of the remaining page is not too misleading, but some pre-filtering could help here.)

Opencv match contour image

I'd like to know what would be the best strategy to compare a group of contours, in fact are edges resulting of a canny edges detection, from two pictures, in order to know which pair is more alike.
I have this image:
http://i55.tinypic.com/10fe1y8.jpg
And I would like to know how can I calculate which one of these fits best to it:
http://i56.tinypic.com/zmxd13.jpg
(it should be the one on the right)
Is there anyway to compare the contours as a whole?
I can easily rotate the images but I don't know what functions to use in order to calculate that the reference image on the right is the best fit.
Here it is what I've already tried using opencv:
matchShapes function - I tried this function using 2 gray scales images and I always get the same result in every comparison image and the value seems wrong as it is 0,0002.
So what I realized about matchShapes, but I'm not sure it's the correct assumption, is that the function works with pairs of contours and not full images. Now this is a problem because although I have the contours of the images I want to compare, they are hundreds and I don't know which ones should be "paired up".
So I also tried to compare all the contours of the first image against the other two with a for iteration but I might be comparing,for example, the contour of the 5 against the circle contour of the two reference images and not the 2 contour.
Also tried simple cv::compare function and matchTemplate, none with success.
Well, for this you have a couple of options depending on how robust you need your approach to be.
Simple Solutions (with assumptions):
For these methods, I'm assuming your the images you supplied are what you are working with (i.e., the objects are already segmented and approximately the same scale. Also, you will need to correct the rotation (at least in a coarse manner). You might do something like iteratively rotate the comparison image every 10, 30, 60, or 90 degrees, or whatever coarseness you feel you can get away with.
For example,
for(degrees = 10; degrees < 360; degrees += 10)
coinRot = rotate(compareCoin, degrees)
// you could also try Cosine Similarity, or even matchedTemplate here.
metric = SAD(coinRot, targetCoin)
if(metric > bestMetric)
bestMetric = metric
coinRotation = degrees
Sum of Absolute Differences (SAD): This will allow you to quickly compare the images once you have determined an approximate rotation angle.
Cosine Similarity: This operates a bit differently by treating the image as a 1D vector, and then computes the the high-dimensional angle between the two vectors. The better the match the smaller the angle will be.
Complex Solutions (possibly more robust):
These solutions will be more complex to implement, but will probably yield more robust classifications.
Haussdorf Distance: This answer will give you an introduction on using this method. This solution will probably also need the rotation correction to work properly.
Fourier-Mellin Transform: This method is an extension of Phase Correlation, which can extract the rotation, scale, and translation (RST) transform between two images.
Feature Detection and Extraction: This method involves detecting "robust" (i.e., scale and/or rotation invariant) features in the image and comparing them against a set of target features with RANSAC, LMedS, or simple least squares. OpenCV has a couple of samples using this technique in matcher_simple.cpp and matching_to_many_images.cpp. NOTE: With this method you will probably not want to binarize the image, so there are more detectable features available.

Resources