I've been watching some videos about face detection (Viola Jones Algorithm) and understood its principle.
But was wondering how simpler subjects can be found. For example a barcode in an image. I can't immagine the Viola Jones Alg. to work for this, as it would produce easily wrogn results. (assumption..)
How can a simple shape like a qr-code or a barcode be found and locate its angle/outer box within an image, without detecting a simple textbox as a barcode?
NOTE: I'm not searching for a library that does this, or some code. I would just like to understand the mechanics behind this.
There is no single answer, this is symbol-dependent.
A QR code is deemed to be located by its "finder patterns", consisting of the filled squares surrounded by another square at three corners. You can find these by binarization, connected components analysis and containment tests. From the three corners, you can guess the outline.
Barcodes can be found by detecting the bars and checking their spatial relations, or by drawing profiles across the image and finding characteristic patterns in the sequence of edges crossed. Finding the exact bounding box can be a little challenging.
Related
I am trying to create a form which will be filled and photographed later on. An issue that I am facing is that of alignment. I came across some deep learning solutions which detect the corners of form. But this is a lot of times inaccurate in my use case where the sheet of paper is folded-reopened/crumpled. I also don't have a lot of flexibility/hard-coding options in the deeplearning process.
Are there any patterns which OpenCV can detect with ~100% accuracy no matter the orientation of the pattern? I will be putting different patterns on 4 corners of the sheet. I am thinking of using the inbuilt template matching function or other pattern recognition algorithms. There are some common patters like a big '+' sign or a star etc that I am trying to avoid. I also tried putting barcodes on the corners because they are also detected fairly easily(Not concerned with the contents of the barcode only their relative positioning). But depending on the quality of image the barcode isn't always detected.
ArUco markers sound like the best option for you, they can easily be implemented in OpenCV.
Aruco example and documentation:https://docs.opencv.org/4.x/d5/dae/tutorial_aruco_detection.html
Python example: https://pyimagesearch.com/2020/12/21/detecting-aruco-markers-with-opencv-and-python/
I have a picture of notebook (with squares) and lines and dots are drawn in it like in the description. Output should be a data structure which contains info about boundaries and dots. How one can accomplish that? If possible, program should process this dynamically (given a video).
Yes this can be accomplished by various image processing techniques.
One famous technique that can help is called the Canny Edge Detector. It can detect all the defined edges within an image. More can be looked into it here. Various python and C# image processing libraries make this extremely easy. Take for example OpenCV
For detecting dots in the middle of the edges, that would be up to you to come up with, unless anyone knows of a library to make that easy as well. I suggest looking at each square that we detected by the canny edge detector and see if there are any dark color values around the middle.
For the data structure, that is also up to you.
Remember that a video is just a sequence of images. Just apply the same technique to all the images.
What method is suitable to capture (detect) MRZ from a photo of a document? I'm thinking about cascade classifier (e.g. Viola-Jones), but it seems a bit weird to use it for this problem.
If you know that you will look for text in a passport, why not try to find passport model points on it first. Match template of a passport to it by using ASM/AAM (Active shape model, Active Appearance Model) techniques. Once you have passport position information you can cut out the regions that you are interested in. This will take some time to implement though.
Consider this approach as a great starting point:
Black top-hat followed by a horisontal derivative highlights long rows of characters.
Morphological closing operation(s) merge the nearby characters and character rows together into a single large blob.
Optional erosion operation(s) remove the small blobs.
Otsu thresholding followed by contour detection and filtering away the contours which are apparently too small, too round, or located in the wrong place will get you a small number of possible locations for the MRZ
Finally, compute bounding boxes for the locations you found and see whether you can OCR them successfully.
It may not be the most efficient way to solve the problem, but it is surprisingly robust.
A better approach would be the use of projection profile methods. A projection profile method is based on the following idea:
Create an array A with an entry for every row in your b/w input document. Now set A[i] to the number of black pixels in the i-th row of your original image.
(You can also create a vertical projection profile by considering columns in the original image instead of rows.)
Now the array A is the projected row/column histogram of your document and the problem of detecting MRZs can be approached by examining the valleys in the A histogram.
This problem, however, is not completely solved, so there are many variations and improvements. Here's some additional documentation:
Projection profiles in Google Scholar: http://scholar.google.com/scholar?q=projection+profile+method
Tesseract-ocr, a great open source OCR library: https://code.google.com/p/tesseract-ocr/
Viola & Jones' Haar-like features generate many (many (many)) features to try to describe an object and are a bit more robust to scale and the like. Their approach was a unique approach to a difficult problem.
Here, however, you have plenty of constraint on the problem and anything like that seems a bit overkill. Rather than 'optimizing early', I'd say evaluate the standard OCR tools off the shelf and see where they get you. I believe you'll be pleasantly surprised.
PS:
You'll want to preprocess the image to isolate the characters on a white background. This can be done quite easily and will help the OCR algorithms significantly.
You might want to consider using stroke width transform.
You can follow these tips to implement it.
I'm trying to do an application which, among other things, is able to recognize chess positions on a computer screen from screenshots. I have very limited experience with image processing techniques and don't wish to invest a great amount of time in studying this, as this is just a pet project of mine.
Can anyone recommend me one or more image processing techniques that would yield me a good result?
The conditions are:
The image is always crispy clean, no noise, poor light conditions etc (since it's a screenshot)
I'm expecting a very low impact on computer performance while doing 1 image / second
I've thought of two modes to start the process:
Feed the piece shapes to the program (so that it knows what a queen, king etc. looks like)
just feed the program an initial image which contains the startup position, from which the program can (after it recognizes the position of the board) pick each chess piece
The process should be relatively easy to understand, as I don't have a very good grasp of image processing techniques (yet)
I'm not interested in using any specific technology, so technology-agnostic documentation would be ideal (C/C++, C#, Java examples would also be fine).
Thanks for taking the time to read this, and I hope to get some good answers.
It' an interesting problem, but you need to specify a lot more than in your original question in order to find an acceptable answer.
On the input images: "screenshots" is quote vague a category. Can you assume that the chessboard will always be entirely in view? Will you have multiple views of the same board? Can you assume that no pieces will be partially or completely occluded in all views?
On the imaged objects and the capture system: will the same chessboard and pieces be used, under very similar illumination? Will the same lens/camera/digitization pipeline be used?
Salut Andrei,
I have done a coin counting algorithm from a picture so the process should be helpful.
The algorithm is called Generalized Hough transform
Make the picture black and white, it is easier that way
Take the image from 1 piece and "slide it over the screenshot"
For each cell you calculate the nr of common pixel in the 2 images
Where you have the largest number there you have the piece
Hope this helps.
Yeah go with Salut Andrei,
Convert the picture into greyscale
Slice into 64 squares and store in array
Using Mat lab can identify the pieces easily
Color can be obtained from Calculating the percentage of No. dot pixels(black pixels)
threshold=no.black pixels /no. of black pixels + no. of white pixels,
If ur value is above threshold then WHITE else BLACK
I'm working on a similar project in c# finding which piece is which isn't the hard part for me. First step is to find a rectangle that shows just the board and cuts everything else out. I first hard-coded it to search for the colors of the squares but would like to make it more robust and reliable regardless of the color scheme. Trying to make it find squares of pixels that match within a certain threshold and extrapolate the board location from that.
I am looking for an efficient way to detect the small boxes around the numbers (see images)?
I already tried to use hough transformation with no success. Any ideas? I need some hints! I am using opencv...
For inspiration, you can have a look at the
Matlab video sudoku solver demo and explanation
Sudoku Grab, an Iphone App, whose author explains the computer vision part on his blog
Alternatively, if you are always hunting for the same grid you could deploy something like this:
Make a perfect artificial template of the grid and detect or save all coordinates from all corners.
In the target image, do the same thing, for example with Harris points. Be creative, you might also be able to use the distinct triangles that can be found in your images.
Using the coordinates from the template and the found harris points, determine the affine transformation x = Ax' between the template and the target image. That transformation can then be used to map the template grid onto the target image. At the very least this will give you some prior information to help guide further segmentation.
The gist of the idea and examples of the estimation of affine matrix A can be found on the site of Zissermans book Multiple View Geometry in Computer Vision and Peter Kovesi
I'd start by trying to detect the rectangular boundary of the overall sheet, then applying a perspective transform to make it truly rectangular. Crop that portion of the image out. If possible, then try to make the alternating white and grey sub-rectangles have an equal background brightness - maybe try adaptive histogram equalization.
Then the Hough transform might perform better. Alternatively, you could then take an approach that's broadly similar to this demonstration by Robert Bemis on MATLAB Central (it's analysing a DNA microarray image rather than Lotto cards, but it's essentially finding bounding boxes of items arranged in a grid). At a high level, the approach is to calculate the autocorrelation along columns and rows of pixels to detect the periodicity of the items in the grid, and use that to impose a bounding box on each item.
Sorry the above advice is mostly MATLAB-based; I'm afraid I'm not an opencv user, but hopefully it will give you some ideas at least.