Features extraction and classification - opencv

I'm implementing an ancient coins recognition system. I have used contours detection to extract features of coins. And I thought to use SVM for training images.
My question is how I can give those features to SVM? I got to know that I have to save those features into a file and then that file should feed to the SVM. But, I don't have an idea to save features to a file.
Saving features to a file means save the number of contours in the image, x, y, width and height of each contour right?
Can someone please help me? I am stuck here for two months. Still, I couldn't find the solution for this.
Once I save features to a file, do I have to give the coin name also to the same file or to another file?
Appreciate your help a lot.
Nadeeshani

It depends on what computer vision / image processor library are you using. For example, OpenCV has builtin SVM functionality:
http://opencv.willowgarage.com/documentation/cpp/support_vector_machines.html
so you don't even have to export the features. But LIBSVM ( http://www.csie.ntu.edu.tw/~cjlin/libsvm/) has many more bindings, also for Matlab, for example.
As for how to give the features to the SVM... the input of most categorizers (including SVMs) is a multi-dimensional vector, so you could get one for example concatenating the first 10 x-y-width-height tuples. However, this naive solution is unlikely to work, because if you change the order of the tuples (or you rotate the coin so that the x-y coordinates change), you will get totally different vectors. So try to make up a coin image -> feature vector mapping that doesn't change when the coin is rotated / moved / noise is added. (second idea: features ordered by size, first 5-10, with some shape descriptors instead of simple width / height maybe?)
Coin names are mostly irrelevant at this phase, use 1-of-N encoding for the SVM output.

Related

Image recognition and Uniqueness detection

I am new to AI/ML and am trying to use the same for solving the following problem.
I have a set of (custom) images which while having common characteristics also will have a unique pattern/signature and color value. What set of algorithms should I use to have the pass in following manner:
1. Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
2. Identify the unique pattern/signature to identify each image individually. The pattern/signature could be shape (visible to human eye or hidden like using an overlay shape using background image with no boundaries).
3. Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
While this is in way similar to face recognition algo, for me saturation/shadow will matter while being direction independent.
I figure that using CNN may be the way to go for step#2 and SVN for step#1, any input on training, specifics will be appreciated. What about step#3, use BGR2HSV? The objective is to use ML/AI and not get into machine-vision.
Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
In a sense, what you want is a classifier that can detect patterns in an image. However, we can train classifiers to detect certain types of patterns in images.
For example, I can train a classifier to recognise squares and circles, but if I show it an image with a triangle in it, I cannot expect it to tell me it is a triangle, because it has never seen it before. The downside is, your classifier will end up misclassifying it as one of the shapes it knows to exist: either square or circle. The upside is, you can prevent this.
Identify the unique pattern/signature to identify each image individually.
What you want to do is train a classifier on a large amount of labelled data. If you want the classifier to detect squares, circles, or triangles in an image, you must train it with a large amount of labelled images of squares, circles and triangles.
Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
Now, you are leaving the territory of simple image labelling and entering the world of computer vision. This is not as simple as a vanilla image classifier, but it is possible and there are a lot of online tools to help you do this. For example, you may take a look at OpenCV. They have an implementation in python and C++.
I figure that using CNN may be the way to go for step#2 and SVN for
step#1
You can combine step 1 and step 2 with a Convolutional Neural Network (CNN). You do not need to use a two step prediction process. However, beware, if you pass the CNN an image of a car, it will still label it as a shape. You can, again circumvent this by training it on a million positive samples of shapes, and a million negative samples of random other images with the class "Other". This way, anything that is not a shape will get classified into "Other". This is one possibility.
What about step#3, use BGR2HSV? The objective is to use ML/AI and not
get into machine-vision.
With the inclusion of this step, there is no option but to get into computer vision. I am not exactly sure how to go about this, but I can guarantee OpenCV will provide you a way to do this. In fact, with OpenCV, you will no longer need to implement your own CNN, because OpenCV has its own image labelling libraries.

How to train SVM for "Euro" coin recognition with OpenCV 3?

My xmas holiday project this year was to build a little Android app, which should be able to detect arbitrary Euro coins in a picture, recognize their value and sum the values up.
My assumptions/requirements for the picture for a good recognition are
uniform background
picture should be roughly the size of a DinA4 paper
coins may not overlap, but may touch each other
number-side of the coins must be up/visible
My initial thought was, that for the coin value-recognition later it would be best to first detect the actual coins/their regions in the picture. Any recognition then would run on only these regions of the picture, where actual coins are found.
So the first step was to find circles. This i have accomplished using this OpenCV 3 pipeline, as suggested in several books and SO postings:
convert to gray
CannyEdge detection
Gauss blurring
HoughCircle detection
filtering out inner/redundant circles
The detection works rather successfully IMHO, here a picture of the result:
Coins detected with HoughCircles with blue border
Up to the recognition now for every found coin!
I searched for solutions to this problem and came up with
template matching
feature detection
machine learning
The template matching seems very inappropriate for this problem, as the coins can be arbitrary rotated with respect to a template coin (and the template matching algorithm is not rotation-invariant! so i would have to rotate the coins!).
Also pixels of the template coin will never exactly match those of the region of the formerly detected coin. So any algorithm computing the similarity will produce only poor results, i think.
Then i looked into feature detection. This seemed more appropriate to me. I detected the features of a template-coin and the candidate-coin picture and drew the matches (combination of ORB and BRUTEFORCE_HAMMING). Unfortunately the features of the template-coin were also detected in the wrong candidate coins.
See the following picture, where the template or "feature" coin is on the left, a 20 Cents coin. To the right there are the candidate coin, where the left-most coin is a 20 Cents coin. I actually expected this coin to have the most matches, unfortunately not. So again, this seems not to be a viable way to recognize the value of coins.
Feature-matches drawn between a template coin and candidate coins
So machine learning is the third possible solution. From university i still now about neural networks, how they work, etc. Unfortunately my practical knowledge is rather poor AND i don't know Support Vector Machines (SVM) at all, which is the machine learning supported by OpenCV.
So my question is actually not source-code related, but more how to setup the learning process.
Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
How much positives and negatives per coin should be given?
Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Of course, if you have a different possible solution, this is also welcome!
Any help is appreciated! :-)
Bye & Thanks!
Answering your questions:
1- Should i learn on the plain coin-images or should i first extract features and learn on the features? (i think: features)
For many object classification tasks it's better to extract the features first and then train a classifier using a learning algorithm. (e.g the features can be HOG and the learning algorithm can be something like SVM or Adaboost). It's mainly due to the fact that the features have more meaningful information compared to the pixel values. (They can describe edges,shapes, texture, etc.) However, the algorithms like deep learning will extract the useful features as a part of learning procedure.
2 - How much positives and negatives per coin should be given?
You need to answer this question depending on the variation in the classes you want to recognize and the learning algorithm you use. For SVM , if you use HOG features and want to recognize specific numbers on coins you won't need much.
3- Would i have to learn also on rotated coins or would this rotation be handled "automagically" by the SVM? So would the SVM recognize rotated coins, even if i only trained it on non-rotated coins?
Again it depends on your final decision about the features(not SVM which is the learning algorithm) you're going to choose. HOG features are not rotation invariant but there are features like SIFT or SURF which are.
4-One of my picture-requirements above ("DinA4") limits the size of the coin to a certain size, e.g. 1/12 of the picture-height. Should i learn on coins of roughly the same size or different sizes? I think, that different sizes would result in different features, which would not help the learning process, what do you think?
Again, choose your algorithm , some of them ask you for a fixed/similar width/height ratio. You can find out about the specific requirements in related papers.
If you decide to use SVM take a look at this and also if you feel ok with Neural Network, using Tensorflow is a good idea.

Can Haar Cascade be too accurate to be useful in this situation?

I'm making a program to detect shapes from an r/c plane for a competition. I have no real images of the targets, but I do have computer generated examples of them on the rules.
My question is, can I train my program to detect real world objects based on computer generated shapes or should I find a different method to complete this task?
I would like to know before I foolishly generate 5k samples and find them useless in the end.
EDIT: I also don't know the exact color of the objects. If I feed the program samples of varying color, will it be a problem?
Thanks in advance!!
Edit2: Here's what groups from my school detected in previous years
As you can see, the detected images are not nearly as flawless as what would appear in real life. If you can suggest a better method, that would help.
If you think that the real images will have unique colors with simple geometric shapes then you could probably try to create a normalized Hue-histogram. Use it to train SVM classifier. The benefit of using Hue-histogram is that it will be rotational and scale invariant.
You can take the few precautions in mind:
Don't forget to remove the illumination affects.
Sometimes, White and black pixels create some problem in hue-histogram calculation so try to remove them from calculation by considering only those pixel which have S>0 and V>0 in S & V channels of HSV image.
I would rather suggest you to use the real world images because the performance is largely dependent upon training (my personal experience). And why don't you try to use SIFT/SURF descriptors for training to SVM (support vector machine) as SIFT/SURF are scale as well as rotational invariant.

find mosquitos' head in the image

I have images of mosquitos similar to these ones and I would like to automatically circle around the head of each mosquito in the images. They are obviously in different orientations and there are random number of them in different images. some error is fine. Any ideas of algorithms to do this?
This problem resembles a face detection problem, so you could try a naïve approach first and refine it if necessary.
First you would need to recreate your training set. For this you would like to extract small images with examples of what is a mosquito head or what is not.
Then you can use those images to train a classification algorithm, be careful to have a balanced training set, since if your data is skewed to one class it would hit the performance of the algorithm. Since images are 2D and algorithms usually just take 1D arrays as input, you will need to arrange your images to that format as well (for instance: http://en.wikipedia.org/wiki/Row-major_order).
I normally use support vector machines, but other algorithms such as logistic regression could make the trick too. If you decide to use support vector machines I strongly recommend you to check libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), since it's a very mature library with bindings to several programming languages. Also they have a very easy to follow guide targeted to beginners (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
If you have enough data, you should be able to avoid tolerance to orientation. If you don't have enough data, then you could create more training rows with some samples rotated, so you would have a more representative training set.
As for the prediction what you could do is given an image, cut it using a grid where each cell has the same dimension that the ones you used on your training set. Then you pass each of this image to the classifier and mark those squares where the classifier gave you a positive output. If you really need circles then take the center of the given square and the radius would be the half of the square side size (sorry for stating the obvious).
So after you do this you might have problems with sizes (some mosquitos might appear closer to the camera than others) , since we are not trained the algorithm to be tolerant to scale. Moreover, even with all mosquitos in the same scale, we still might miss some of them just because they didn't fit in our grid perfectly. To address this, we will need to repeat this procedure (grid cut and predict) rescaling the given image to different sizes. How many sizes? well here you would have to determine that through experimentation.
This approach is sensitive to the size of the "window" that you are using, that is also something I would recommend you to experiment with.
There are some research may be useful:
A Multistep Approach for Shape Similarity Search in Image Databases
Representation and Detection of Shapes in Images
From the pictures you provided this seems to be an extremely hard image recognition problem, and I doubt you will get anywhere near acceptable recognition rates.
I would recommend a simpler approach:
First, if you have any control over the images, separate the mosquitoes before taking the picture, and use a white unmarked underground, perhaps even something illuminated from below. This will make separating the mosquitoes much easier.
Then threshold the image. For example here i did a quick try taking the red channel, then substracting the blue channel*5, then applying a threshold of 80:
Use morphological dilation and erosion to get rid of the small leg structures.
Identify blobs of the right size to be moquitoes by Connected Component Labeling. If a blob is large enough to be two mosquitoes, cut it out, and apply some more dilation/erosion to it.
Once you have a single blob like this
you can find the direction of the body using Principal Component Analysis. The head should be the part of the body where the cross-section is the thickest.

Face Recognition Logic

I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.

Resources