Algorithm for capturing machine readable zones - opencv

What method is suitable to capture (detect) MRZ from a photo of a document? I'm thinking about cascade classifier (e.g. Viola-Jones), but it seems a bit weird to use it for this problem.

If you know that you will look for text in a passport, why not try to find passport model points on it first. Match template of a passport to it by using ASM/AAM (Active shape model, Active Appearance Model) techniques. Once you have passport position information you can cut out the regions that you are interested in. This will take some time to implement though.

Consider this approach as a great starting point:
Black top-hat followed by a horisontal derivative highlights long rows of characters.
Morphological closing operation(s) merge the nearby characters and character rows together into a single large blob.
Optional erosion operation(s) remove the small blobs.
Otsu thresholding followed by contour detection and filtering away the contours which are apparently too small, too round, or located in the wrong place will get you a small number of possible locations for the MRZ
Finally, compute bounding boxes for the locations you found and see whether you can OCR them successfully.
It may not be the most efficient way to solve the problem, but it is surprisingly robust.

A better approach would be the use of projection profile methods. A projection profile method is based on the following idea:
Create an array A with an entry for every row in your b/w input document. Now set A[i] to the number of black pixels in the i-th row of your original image.
(You can also create a vertical projection profile by considering columns in the original image instead of rows.)
Now the array A is the projected row/column histogram of your document and the problem of detecting MRZs can be approached by examining the valleys in the A histogram.
This problem, however, is not completely solved, so there are many variations and improvements. Here's some additional documentation:
Projection profiles in Google Scholar: http://scholar.google.com/scholar?q=projection+profile+method
Tesseract-ocr, a great open source OCR library: https://code.google.com/p/tesseract-ocr/

Viola & Jones' Haar-like features generate many (many (many)) features to try to describe an object and are a bit more robust to scale and the like. Their approach was a unique approach to a difficult problem.
Here, however, you have plenty of constraint on the problem and anything like that seems a bit overkill. Rather than 'optimizing early', I'd say evaluate the standard OCR tools off the shelf and see where they get you. I believe you'll be pleasantly surprised.
PS:
You'll want to preprocess the image to isolate the characters on a white background. This can be done quite easily and will help the OCR algorithms significantly.

You might want to consider using stroke width transform.
You can follow these tips to implement it.

Related

Image similarity of apartment photos

I want to design an algorithm that would find matches in images of the same apartment, when put up by different real estate agents.
Photos are relatively taken in similar time so the interior of the rooms should not change that much but of course every guys takes different pictures from different angles, etc.
(TLDR; a apartment goes for sale, and different real estate guys come in and make their own pictures, and I want to know if the given pictures from various guys are of the same place)
I know that image processing and recognition algorithm selections highly depend on the use case, so could you point me in correct direction given my use-case?
http://reality.bazos.sk/inzerat/56232813/Prenajom-1-izb-bytu-v-sirsom-centre.php
http://reality.bazos.sk/inzerat/56371292/-PRENAJOM-krasny-1i-byt-rekonstr-Kupeckeho-Ruzinov-BA-II.php
You can actually use Clarifai's Custom Training API endpoint, fairly simple and straightforward. All you would have to do is train the initial image and then compare the second to it. If the probability is high, it is likely the same apartment. For example:
In javascript, to declare a positive it is:
clarifai.positive('http://example.com/apartment1.jpg', 'firstapartment', callback);
And a negative is:
clarifai.negative('http://example.com/notapartment1.jpg', 'firstapartment', callback);
You don't necessarily have to do a negative, but it could only help. Then, when you are comparing images to the first aparment, you do:
clarifai.predict('http://example.com/someotherapartment.jpg', 'firstapartment', callback);
This will give you a probability regarding the likeness of the photo to what you've trained ('firstapartment'). This API is basically doing machine learning without the hassle of the actual machine. Clarifai's API also has a tagging input that is extremely accurate with some basic tags. The API is free for a certain number of calls/month. Definitely worth it to check out for this case.
As user Shaked mentioned in a comment, this is a difficult problem. Even if you knew the position and orientation of each camera in space, and also the characteristics of each camera, it wouldn't be a trivial problem to match the images.
A "bag of words" (BoW) approach may be of use here. Rather than try to identify specific objects and/or deduce the original 3D scene, you determine what "feature descriptors" can distinguish objects from one another in your image sets.
https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision
Imagine you could describe the two images by the relative locations of textures and colors:
horizontal-ish line segments at far left
red blob near center left
green clumpy thing at bottom left
bright round object near top left
...
then for a reasonably constrained set of images (e.g. photos just within a certain zip code), you may be able to yield a good match between the two images above.
The Wikipedia article on BoW may look a bit daunting, but I think if you hunt around you'll find an article that describes "bag of words" for image processing clearly. I've seen a very good demo of a BoW approach used to identify objects such as boats and delivery vans in arbitrary video streams, and it worked impressively well. I wish I had a copy of the presentation to pass along.
If you don't suspect the image to change much, you could try the standard first step of any standard structure-from-motion algorithm to establish a notion of similarity between a pair of images. Any pair of images are similar if they contain a number of matching image features larger than a threshold which satisfy the geometrical constraint of the scene as well. For a general scene, that geometrical constraint is given by a Fundamental Matrix F computed using a subset of matching features.
Here are the steps. I have inserted the opencv method for each step, but you could write your methods too:
Read the pair of images. Use img = cv2.imread(filename).
Use SIFT/SURF to detect image features/descriptors in both images.
sift = cv2.xfeatures2d.SIFT_create()
kp, des = sift.detectAndCompute(img,None)
Match features using the descriptors.
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1,des2)
Use RANSAC to compute funamental matrix.
cv2.findFundamentalMatrix(pts1, pts2, cv2.FM_RANSAC, 3, 0.99, mask)
mask contains all the inliers. Simply count them to determine if the number of matches satisfying geometrical constraint is large enough.
CAUTION: In case of a planar scene, we use homography instead of a fundamental matrix and the steps described above work out pretty nicely because homography takes a point to a corresponding point in the other image. However, Fundamental matrix takes a point to the corresponding epipolar line in the other image, which makes the entire process a bit less stable. So I would recommend trying these steps a few more times with a little bit of jitter to the feature locations and collating the evidence over more than one trial to make the decision. You can also use more advanced steps to introduce robustness to this process but only if the steps described above don't yield the results you need.

Detect banana or apple among the bunch of fruits on a plate with > 90% success rate. (See image)

Hi! I'm kinda new to OpenCV and Image processing. I've tried following approaches until now, but I believe there's gotta be a better approach.
1). Finding color range (HSV) manually using GColor2/Gimp tool/trackbar manually from a reference image which contains a single fruit (banana)
with a white background. Then I used inRange(), findContour(),
drawContour() on both reference banana image & target
image(fruit-platter) and matchShapes() to compare the contours in the
end.
It works fine as long as the color range chosen is appropriate. (See 2nd image). But since these fruits doesn’t have uniform solid color, this approach didn't seem like an ideal approach to me. I don't want to hard-code the color-range (Scalar values) inside inRange().
2). Manual thresholding and contour matching.
Same issue as (1). Don't wanna hard-code the threshold value.
3). OTSU thresholding and canny edge detection.
Doesn't work well for banana, apple and lemon.
4). Dynamically finding colors. I used the cropped banana reference
image. Calculated the mean & standard deviation of the image.
Don't know how to ignore the white background pixels in my mean/std-dev calculation without looping through each x,y pixels. Any suggestions on this are welcome.
5). Haar Cascade training gives inaccurate results. (See the image below). I believe proper training might give better results. But not interested in this for now.
Other approaches I’m considering:
6). Using floodfill to find all the connected pixels and
calculating
the average and standard deviation of the same.
Haven't been successful in this. Not sure how to get all the connected pixels. I dumped the mask (imwrite) and got the banana (from the reference banana image) in black & white form. Any suggestions on this are welcome.
7). Hist backprojection:- not sure how it would help me.
8). K-Means , not tried yet. Let me know, if it’s better than step
(4).
9). meanshift/camshift → not sure whether it will help. Suggestions are welcome.
10). feature detection -- SIFT/SURF -- not tried yet.
Any help, tips, or suggestions will be highly appreciated.
Answers to such generic questions (object detection), especially to ones like this that are very active research topics, essentially boil down to a matter of preference. That said, of the 10 "approaches" you mentioned, feature detection/extraction is probably the one deserving the most attention, as it's the fundamental building block of a variety of computer vision problems, including but not limited to object recognition/detection.
A very simple but effective approach you can try is the Bag-of-Words model, very commonly used in early attempts at fast object detection, with all global spatial relationship information lost.
Late object detection research trend from what I observed from annual computer vision conference proceedings is that you encode each object by a graph that store feature descriptors in the nodes and store the spatial relationship information in the edges, so part of the global information is preserved, as we can now match not only the distance of feature descriptors in feature space but also the spatial distance between them in image space.
One common pitfall specific to this problem you described is that the homogeneous texture on banana and apple skins may not warrant a healthy distribution of features and most features you detect will be on the intersections of (most commonly) 3 or more objects, which in itself isn't a commonly regarded "good" feature. For this reason I suggest looking into superpixel object recognition (Just Google it. Seriously.) approaches, so the mathematical model of class "Apple" or "Banana" will be a block of interconnecting superpixels, stored in a graph, with each edge storing spatial relationship information and each node storing information concerning the color distribution etc. of the neighborhood specified by the superpixel. Then recognition will be come a (partial) graph matching problem or a problem related to probabilistic graphical model with many existing research done w.r.t it.

Recognition and counting of books from side using OpenCV

Just wish to receive some ideas on I can solve this problem.
For a clearer picture, here are examples of some of the image that we are looking at:
I have tried looking into thresholding it, like otsu, blobbing it, etc. However, I am still unable to segment out the books and count them properly. Hardcover is easy of course, as the cover clearly separates the books, but when it comes to softcover, I have not been able to successfully count the number of books.
Does anybody have any suggestions on what I can do? Any help will be greatly appreciated. Thanks.
I ran a sobel edge detector and used Hough transform to detect lines on the last image and it seemed to be working okay for me. You can then link the edges on the output of the sobel edge detector and then count the number of horizontal lines. Or, you can do the same on the output of the lines detected using Hough.
You can further narrow down the area of interest by converting the image into a binary image. The outputs of all of these operators can be seen in following figure ( I couldn't upload an image so had to host it here) http://www.pictureshoster.com/files/v34h8hvvv1no4x13ng6c.jpg
Refer to http://www.mathworks.com/help/images/analyzing-images.html#f11-12512 for some more useful examples on how to do edge, line and corner detection.
Hope this helps.
I think that #audiohead's recommendation is good but you should be careful when applying the Hough transform for images that will have the library's stamp as it might confuse it with another book (You can see that the letters form some break-lines that will be detected by sobel).
Consider to apply first an edge preserving smoothing algorithm such as a Bilateral Filter. When tuned correctly (setting of the Kernels) it can avoid these such of problems.
A Different Solution That Might Work (But can be slow)
Here is a different approach that is based on pixel marking strategy.
a) Based on some very dark threshold, mark all black pixels as visited.
b) While there are unvisited pixels: Pick the next unvisited pixel and apply a region-growing algorithm http://en.wikipedia.org/wiki/Region_growing while marking its pixels with a unique number. At this stage you will need to analyse the geometric shape that this region is forming. A good criteria to detecting a book is that the region is creating some form of a rectangle where width >> height. This will detect a book and mark all its pixels to the unique number.
Once there are no more unvisited pixels, the number of unique numbers is the number of books you will have + For each pixel on your image you will now to which book does it belongs.
Do you have to keep the books this way? If you can change the books to face back side to the camera then I think you can get more information about the different colors used by different books.The lines by Hough transform or edge detection will be more prominent this way.
There exist more sophisticated methods which are much better in contour detection and segmentation, you can have a look at them here, however it is quite slow, http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
Once you get the ultrametric contour map, you can perform some computation on them to count the number of books
I would try a completely different approach; with paperbacks, the covers are medium-dark lines whilst the rest of the (assuming white pages) are fairly white and "bloomed", so I'd try to thicken up the dark edges to make them easy to detect, then that would give the edges akin to working with hardbacks which you say you've done.
I'd try something like an erosion to thicken up the edges. This would be a nice, fast operation.

Extract numbers from Image

I have an image for mobile phone credit recharge card and I want to extract the recharge number only (the gray area) as a sequence of number that can be used to recharge the phone directly
This is a sample photo only and cannot be considered as standard, thus the rectangle area may differ in position , in the background and the card also may differ in size .The scratch area may not be fully scratched , the camera's depth and position may differ too . I read a lots and lots of papers on the internet but i can't find any thing that could be interesting and most of papers discuss detection of handwritten numbers .
Any links or algorithms names could be very useful .
You can search the papers on vehicle plate number detection with machine learning methods. Basically you need to extract the number first, you may use sobel filter to extract the vertical edges , then threshold (binary image) and morphologic operations (remove blank spaces between each vertical edge line, and connect all regions that have a high number of edges). Finally retrieve the contour and fill in the connected components with mask.
After you extract the numbers , you can use machine learning method such as neural network and svm to recognize them.
Hope it helps.
Extract the GRAY part from image and then Use Tesseract(OCR) to extract the text written on the gray image.
I think you may not find the algorithm to read from the image on the internet. Nobody will disclose that. I think, if you are a hardcore programmer you can crack that using your own code. I tried from the screenshots where the fonts were clearer and the algorithm was simple. For this, the algorithm should be complex since you are reading from photo source instead of a screenshot.
Follow the following steps:
Load the image.
Select the digits ( By contour finding and applying constraints on area and height of letters to avoid false detections). This will split the image and thus modularise the OCR operation you want to perform.
A simple K - nearest neighbour algorithm for performing the identification and classification.
If the end goal was just to make a bot, you could probably pull the text directly from the app rather than worrying about OCR, but if you want to learn more about machine learning and you haven't done them already the MNIST and CIFAR-10 datasets are fantastic places to start.
If you preprocessed your image so that yellow pixels are black and all others are white you would have a much cleaner source to work with.
If you want to push forward with Tesseract for this and the preprocessing isn't enough then you will probably have to retrain it for this font. You will need to prepare a corpus, process it similarly to how you expect your source data to look, and then use something like qt-box-editor to correct the data. This guide should be able to walk you through the basic steps of retraining.

How to align two different pictures in such a way, that they match as close as possible?

I need to automatically align an image B on top of another image A in such a way, that the contents of the image match as good as possible.
The images can be shifted in x/y directions and rotated up to 5 degrees on z, but they won't be distorted (i.e. scaled or keystoned).
Maybe someone can recommend some good links or books on this topic, or share some thoughts how such an alignment of images could be done.
If there wasn't the rotation problem, then I could simply try to compare rows of pixels with a brute-force method until I find a match, and then I know the offset and can align the image.
Do I need AI for this?
I'm having a hard time finding resources on image processing which go into detail how these alignment-algorithms work.
So what people often do in this case is first find points in the images that match then compute the best transformation matrix with least squares. The point matching is not particularly simple and often times you just use human input for this task, you have to do it all the time for calibrating cameras. Anyway, if you want to fully automate this process you can use feature extraction techniques to find matching points, there are volumes of research papers written on this topic and any standard computer vision text will have a chapter on this. Once you have N matching points, solving for the least squares transformation matrix is pretty straightforward and, again, can be found in any computer vision text, so I'll assume you got that covered.
If you don't want to find point correspondences you could directly optimize the rotation and translation using steepest descent, trouble is this is non-convex so there are no guarantees you will find the correct transformation. You could do random restarts or simulated annealing or any other global optimization tricks on top of this, that would most likely work. I can't find any references to this problem, but it's basically a digital image stabilization algorithm I had to implement it when I took computer vision but that was many years ago, here are the relevant slides though, look at "stabilization revisited". Yes, I know those slides are terrible, I didn't make them :) However, the method for determining the gradient is quite an elegant one, since finite difference is clearly intractable.
Edit: I finally found the paper that went over how to do this here, it's a really great paper and it explains the Lucas-Kanade algorithm very nicely. Also, this site has a whole lot of material and source code on image alignment that will probably be useful.
for aligning the 2 images together you have to carry out image registration technique.
In matlab, write functions for image registration and select your desirable features for reference called 'feature points' using 'control point selection tool' to register images.
Read more about image registration in the matlab help window to understand properly.

Resources