Swapping face parts relative to facial landmarks - machine-learning

I got an image of an individual with a beard:
Using a mask, I was able to extract the beard:
I want to move the beard on another person's face, such as this one:
I want to do this by getting the nose location of the first person, the nose location of the 2nd person and position the beard accordingly.
What are some ways to accomplish this goal? How can I do this by getting the facial landmarks? Are there any non-Deep Learning methods of doing this?

You could accomplish the goal by training a conv-net that takes as input images of the person's face, and returns the co-ordinates of the tip of the nose for positioning the person's beard. To train a conv-net, you will need data. You can potentially create your own data using an image annotation tool such as this, or find a relevant dataset online such as this one, to train your model. Here is a keypoint detection tutorial to get you started.
There may be non-deep learning methods of doing this. Maybe you could create SIFT descriptors for a person's nose, find the closest matching descriptors in the images you want to detect the tip of the nose in, and use their location to determine the position of the tip of the nose in your image of interest. However, I would recommend you go with the deep learning approach. Keypoint detection is a popular task in deep learning and you are likely to find many resources online for help.

Related

Cropping faces from an image

I have a collection of face images, with 1 or sometimes 2 faces in each image. What I wanna do, is find the face in each image and then crop It.
I've tested a couple of methods, which are implemented in python using openCV, but the results weren't that good. These methods are:
1- Implementation 1
2- Implementation 2
There's one more model that I've tested, but I'm not allowed to post more than two links.
The problem is that these Haar-Feature based algorithms, are not robust to face size, and when I tried them on images which were taken close to the face, they couldn't find any faces.
Someone mentioned to try deep learning based algorithms, but I couldn't find one corresponding to what I want to do. Basically, I guess I need a pre-trained model, which can give me the coordinates of the face bounding box in the image, or better, a pre-trained model which gives out the cropped face image as output.
You don't need machine learning algorithms, Graph-Algorithms is enough. For example Snapchats face recognition algorithm works as follows:
Create a Graph with Nodes and Edges from a most common Face ("Standard Face").
Deform that Graph / Recoordinate the Nodes to the fitted pixels in the Input Image
voila you got the face recognized in the Input Image.
Easy said, but harder to code. We implemented in our university the Dijkstra Algorithm for example and I can hand you my "Graph" Class if you need it. But I wrote it in C++.
With these graph-algorithm you can crop out the faces more efficient.

Object detection using a fish eye lens

I'm trying to use the hog detector in openCV, to detect 3 types of object from a video feed through a fish eye. The types are:
People
Books (when held by some person)
Chairs
The snapshot of the video I have looks like this image from this website - :
I setup the hog classifier using the default people detector and tried do first detect the people. I noticed when the people were of the size that you would expect from a non-fish eye lens (something you would get with a standard 35mm lens), they would get detected. If not the people would not get detected. This seemed logical as the classifier would expect people to be a standard size.
I was wondering how I could modify the classifier to detect people thorough a fish eye lens. The options I see are these:
Undistort the fish eye effect and run the classifier - I do not like to do this, because currently, I'm not in a position to calibrate the camera and get the distortion coefficients
Distort people images from a people image data set to around the distortion I would get through my video and re-train the classifier - I think this would work, but would like to understand would this not work as I think it work.
My question is:
What would be a valid approach for this problem? Will #2 of my options work for all 3 types of objects (people, books and chairs).
What is good classifier that can be trained to identify the 3 types of objects (cascade or hog or anything else - please suggest a library as well)? Will my #2 method of distorting and training with positive and negative examples be a good solution?
Retraining the HOG cascade to the performance level of the cascade included with OpenCV would be a pretty involved process. You would also have to simulate the distortion of your specific lens to modify the training data.
For the quickest solution I would recommend your first option of distorting the image. If you are willing to put in the time and resources to retrain the classifier (which you may have to do depending on how you are detecting chairs and books) then there are some publicly available pedestrian datasets that will be useful.
1) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
2) http://pascal.inrialpes.fr/data/human/
Its unlikely that you'll be able to find a chair cascade due to the variability in chair design. I would recommend you train your own cascade on the specific chairs you intend to detect. I don't know of any existing cascade for books and a quick google search didn't yield any promising results. A good resource for data if you intend on training your own cascade for books is ImageNet.

why is shape-indexed-feature so effective on face alignment?

I am implementing some face alignment algorithm recently. I have read the following papers:
Supervised descent method and its applications to face alignment
Face alignment by explicit shape regression
Face alignment at 3000 fps via regressing local binary features
All the paper mentioned a important keyword: shape-indexed-feature or pose-indexed-feature. This feature plays a key role in face alignment process. I did not get the key point of this feature. Why is it so important?
A shape-indexed-feature is a feature who's index gives some clue about the hierarchical structure of the shape that it came from. So in face alignment, facial landmarks are extremely important, since they are the things that will be useful in successfully aligning the faces. But, just taking facial landmarks into account throws away some of the structure inherent to a face. You know that the pupil is inside the iris, which is inside the eye. So a shape-indexed-feature would do more than tell you that you are looking at a facial landmark - it would tell you that you are looking at a facial landmark inside another landmark inside another landmark. Because there are only a few features that are 3-nested like that, you can be more confident about aligning those correctly.
Here is a much older paper that explains some of this with simpler language (especially in the introduction): http://www.cs.ubc.ca/~lowe/papers/cvpr97.pdf
If you want get shape-indexed-feature, you should do similarity transform for the face landmarks in one image first. The aim is transform the origin landmarks to a specific location which could be the mean landmark of all images. So the landmarks of each image is at same position.
Then you could extract local features according to the relocate landmarks, which are shape-indexed-feature, cause now the landmarks of each image is a fix shape.
I seached hours get the answer above, a graduation thesis and translated it, but not sure whether it's a right answer or not. In my opinion, it make sense.

Hand Detection Opencv

I am trying to detect hand using OpenCV and C++.
I am able to find the contour of the hand (Positive image) with person hand present in the image. Basically I am finding largest contour and consider it as hand contour. Lets say in the given image the hand is not present then I will take any contour and consider it as the hand.
So I started thinking can I use the haar cascade to determine the rectangle of the hand and focus on that area, but I tried searching online for the xml but I think it is not available like face detection.
So given a image how can I determine from the set of contour which one is of hand?
You can find the best trained cascade xml file from the GitHub...
Here it is...
https://github.com/Aravindlivewire/Opencv/blob/master/haarcascade/aGest.xml

Object Recognition by Outlines vs Features

Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!

Resources