How to use a Neural Network for face detection? - machine-learning

I'm trying to build a face detection system using a neural network written in theano. I am a bit confused as to what should be the expected output against which i would have to calculate the crossentropy. I don't want to know whether the face is present or not, i need to highlight the face in an image (find the location of the face). The size of the images is constant. But the size of the faces in the image is not. How do i go about that? Also, my webcam currently captures 480x640 images. Creating that number of neurons in the input layer would be very heavy on the system, how do i compress the images without losing any features?

There are many possible solutions, one of the easiest ones is to perform a sliding window search and ask a network "is there a face in this part of an image?" - and this is quite "standard" approach. In particular, you do it hierarchicaly - split image into 9 overlapping squares (I assume the image is square) and ask in each of them "is there a face in it?" by rescaling it to your network input. Next you again split the one answering "yes" into 9 squares and repeat. This way you can find face kind fast. Another would be to perform supervised segmentation where you try to predict which part of image (pixels/superpixels) belong to face and which do not. This is not exhaustive list, but should give you general idea how to proceed.
how do i compress the images without losing any features?
You do not. It is not possible. You will always lose some data when downscaling (lossless compression exists but it destroys structure, thus making classification extremely hard).

You should first create a training set from the images received through the web-cam. The training set must contain face and non-face images (such as apple, car and ...). For better generalization you may use some off-the-shelve data sets. After you trained the network on the images you can use the network to classify unseen images.
This approach is suitable if your goal is to only detect whether an image contains a face. However, if you want to identify faces (e.g. this face belongs to John and not other people) you need to train the network with the images of the people you want to do identification for. The number of classes in such network is equivalent to the number of distinct people.

Related

Neural Network for Learning Cut VS Uncut Grass

I've got a script to take pictures like the one provided, with colored loops encircling either uncut grass, cut grass, or other background details (for purposes of rejecting non-grass regions), and generate training data in the form of a bunch of small images from inside the colored loops of those types of training data. I'm struggling to find which type of neural network that would work best for learning from this training data and telling me in real time from a video feed mounted on a lawn mower which sections of the image is uncut grass or cut grass as it is mowing though a field. Is there anyone on here experienced with neural networks, and can either tell me some I could use, or just point me in the right direction?
Try segmentation network. There are many types of segmentation.
Mind that for neuron networks, training data is necessary. Your case (to detect cut and uncut grass) is considered special, which means existing models may not fit your purpose. If so, you'll need a dataset including images and annotations. There are also tools for labeling segmentation images.
Hope it helps.

Image recognition and Uniqueness detection

I am new to AI/ML and am trying to use the same for solving the following problem.
I have a set of (custom) images which while having common characteristics also will have a unique pattern/signature and color value. What set of algorithms should I use to have the pass in following manner:
1. Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
2. Identify the unique pattern/signature to identify each image individually. The pattern/signature could be shape (visible to human eye or hidden like using an overlay shape using background image with no boundaries).
3. Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
While this is in way similar to face recognition algo, for me saturation/shadow will matter while being direction independent.
I figure that using CNN may be the way to go for step#2 and SVN for step#1, any input on training, specifics will be appreciated. What about step#3, use BGR2HSV? The objective is to use ML/AI and not get into machine-vision.
Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
In a sense, what you want is a classifier that can detect patterns in an image. However, we can train classifiers to detect certain types of patterns in images.
For example, I can train a classifier to recognise squares and circles, but if I show it an image with a triangle in it, I cannot expect it to tell me it is a triangle, because it has never seen it before. The downside is, your classifier will end up misclassifying it as one of the shapes it knows to exist: either square or circle. The upside is, you can prevent this.
Identify the unique pattern/signature to identify each image individually.
What you want to do is train a classifier on a large amount of labelled data. If you want the classifier to detect squares, circles, or triangles in an image, you must train it with a large amount of labelled images of squares, circles and triangles.
Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
Now, you are leaving the territory of simple image labelling and entering the world of computer vision. This is not as simple as a vanilla image classifier, but it is possible and there are a lot of online tools to help you do this. For example, you may take a look at OpenCV. They have an implementation in python and C++.
I figure that using CNN may be the way to go for step#2 and SVN for
step#1
You can combine step 1 and step 2 with a Convolutional Neural Network (CNN). You do not need to use a two step prediction process. However, beware, if you pass the CNN an image of a car, it will still label it as a shape. You can, again circumvent this by training it on a million positive samples of shapes, and a million negative samples of random other images with the class "Other". This way, anything that is not a shape will get classified into "Other". This is one possibility.
What about step#3, use BGR2HSV? The objective is to use ML/AI and not
get into machine-vision.
With the inclusion of this step, there is no option but to get into computer vision. I am not exactly sure how to go about this, but I can guarantee OpenCV will provide you a way to do this. In fact, with OpenCV, you will no longer need to implement your own CNN, because OpenCV has its own image labelling libraries.

How to select appropriate positive and negative training image sets for image classification using BOW

Im trying implement a real time object classification program using SVM classification and BoW clustering algorithms. My questions is what are the good practices for selecting positive and negative training images?
Positive image sets
Should the background be empty? Meaning, should the image only contain the object of interest? When implementing this algorithm in real time, the test image will not contain only the object of interest, it will definitely have some information from the background as well. So instead of using isolated image collection, should I choose images which look more similar to the test images?
Negative image sets
Can these be any image set without the object of interest? Or should they be from the environment where this algorithm is going to be tested without object of interest?. For example, if I'm going to classify phones in my living room environment, should negatives be the background image set of my living room environment without the phone in the foreground? or can it be any image set? (like kitchen, living room, bedroom or outdoor images) Im asking this because, I don't want the system to be environment-specific. Must be robust at any environment (indoors and outdoors)
Thank you. Any help or advice is much appreciated.
Positive image sets
Yes you should definitely choose images which look more similar to the test images.
Negative image sets
It can be any image set however, it is better to include images from the environment where this algorithm is going to be tested without object of interest.
Generally
Please read my answer to some other SO question, it would be useful. Discussion continued in comments, so that might be useful as well.

Iris detection in still images

I have good resolution faces images and I would like to automatically detect the iris and know its color. Is there any state-of-the-art (standard) way to detect iris other than HoughCircles which is not reporting consistent results on different images. One condition I have is that I have to use still images (no video is available)?
I am using OpenCV-Python for image processing. Any help is highly appreciated.
I think the problem can be split into two parts:
Localisation of the iris regions
Estimating the colour
Step one is time consuming, but I have done this at my workplace. You can train a Haar-cascade classifier for iris images (grayscale), and localise the iris within the eye-region returned be the cascade classifier for the eyes. If you already have a collection of face images, you can use them. Otherwise, try to collect as many samples as possible, with the same image quality as the images you want to use.
Step two is relatively easy, but might not be "very easy" because of auto white balance etc.
If you want a simpler approach, try detecting the white regions of the eye and using them to loc

Face Recognition Logic

I want to develop an application in which user input an image (of a person), a system should be able to identify face from an image of a person. System also works if there are more than one persons in an image.
I need a logic, I dont have any idea how can work on image pixel data in such a manner that it identifies person faces.
Eigenface might be a good algorithm to start with if you're looking to build a system for educational purposes, since it's relatively simple and serves as the starting point for a lot of other algorithms in the field. Basically what you do is take a bunch of face images (training data), switch them to grayscale if they're RGB, resize them so that every image has the same dimensions, make the images into vectors by stacking the columns of the images (which are now 2D matrices) on top of each other, compute the mean of every pixel value in all the images, and subtract that value from every entry in the matrix so that the component vectors won't be affine. Once that's done, you compute the covariance matrix of the result, solve for its eigenvalues and eigenvectors, and find the principal components. These components will serve as the basis for a vector space, and together describe the most significant ways in which face images differ from one another.
Once you've done that, you can compute a similarity score for a new face image by converting it into a face vector, projecting into the new vector space, and computing the linear distance between it and other projected face vectors.
If you decide to go this route, be careful to choose face images that were taken under an appropriate range of lighting conditions and pose angles. Those two factors play a huge role in how well your system will perform when presented with new faces. If the training gallery doesn't account for the properties of a probe image, you're going to get nonsense results. (I once trained an eigenface system on random pictures pulled down from the internet, and it gave me Bill Clinton as the strongest match for a picture of Elizabeth II, even though there was another picture of the Queen in the gallery. They both had white hair, were facing in the same direction, and were photographed under similar lighting conditions, and that was good enough for the computer.)
If you want to pull faces from multiple people in the same image, you're going to need a full system to detect faces, pull them into separate files, and preprocess them so that they're comparable with other faces drawn from other pictures. Those are all huge subjects in their own right. I've seen some good work done by people using skin color and texture-based methods to cut out image components that aren't faces, but these are also highly subject to variations in training data. Color casting is particularly hard to control, which is why grayscale conversion and/or wavelet representations of images are popular.
Machine learning is the keystone of many important processes in an FR system, so I can't stress the importance of good training data enough. There are a bunch of learning algorithms out there, but the most important one in my view is the naive Bayes classifier; the other methods converge on Bayes as the size of the training dataset increases, so you only need to get fancy if you plan to work with smaller datasets. Just remember that the quality of your training data will make or break the system as a whole, and as long as it's solid, you can pick whatever trees you like from the forest of algorithms that have been written to support the enterprise.
EDIT: A good sanity check for your training data is to compute average faces for your probe and gallery images. (This is exactly what it sounds like; after controlling for image size, take the sum of the RGB channels for every image and divide each pixel by the number of images.) The better your preprocessing, the more human the average faces will look. If the two average faces look like different people -- different gender, ethnicity, hair color, whatever -- that's a warning sign that your training data may not be appropriate for what you have in mind.
Have a look at the Face Recognition Hompage - there are algorithms, papers, and even some source code.
There are many many different alghorithms out there. Basically what you are looking for is "computer vision". We had made a project in university based around facial recognition and detection. What you need to do is google extensively and try to understand all this stuff. There is a bit of mathematics involved so be prepared. First go to wikipedia. Then you will want to search for pdf publications of specific algorithms.
You can go a hard way - write an implementaion of all alghorithms by yourself. Or easy way - use some computer vision library like OpenCV or OpenVIDIA.
And actually it is not that hard to make something that will work. So be brave. A lot harder is to make a software that will work under different and constantly varying conditions. And that is where google won't help you. But I suppose you don't want to go that deep.

Resources