how to register face by landmark points - image-processing

Face registration is an important task in face recognition system . I knew how to register face by two eye-center points like this. But i don not know how to use more than two point (e.g. two eye-center, nose tip and two mouth corner) to do a face registration.
Any idea for that? Thanks in advance!

If you have a lot of points, say 68..then you can perform delaunay triangulation and then perform piecewise affine warp.
If you have much fewer than 68, say 5 or 6, then you can try least square fitting of affine or perspective transform. I believe you can use the findhomography function of opencv and then use perspectivetransform function to perform this step.

For 2D alignment - to discover the affine transform that maps a set of landmark points onto another set - you are probably best starting with the classic Procrustes Analysis.
Here someone very graciously provides a converted implementation (from Matlab) into python.
Using this, here's how I can do what I think you are after...
import procrustes as pc
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import cv2
# Open images...
target_X_img = cv2.imread('arnie1.jpg',0)
input_Y_img = cv2.imread('arnie2.jpg',0)
# Landmark points - same number and order!
# l eye, r eye, nose tip, l mouth, r mouth
X_pts = np.asarray([[61,61],[142,62],[101,104],[71,143],[140,139]])
Y_pts = np.asarray([[106,91],[147,95],[129,111],[104,130],[141,135]])
# Calculate transform via procrustes...
d,Z_pts,Tform = pc.procrustes(X_pts,Y_pts)
# Build and apply transform matrix...
# Note: for affine need 2x3 (a,b,c,d,e,f) form
R = np.eye(3)
R[0:2,0:2] = Tform['rotation']
S = np.eye(3) * Tform['scale']
S[2,2] = 1
t = np.eye(3)
t[0:2,2] = Tform['translation']
M = np.dot(np.dot(R,S),t.T).T
tr_Y_img = cv2.warpAffine(input_Y_img,M[0:2,:],(400,400))
# Confirm points...
aY_pts = np.hstack((Y_pts,np.array(([[1,1,1,1,1]])).T))
tr_Y_pts = np.dot(M,aY_pts.T).T
# Show result - input transformed and superimposed on target...
plt.figure()
plt.subplot(1,3,1)
plt.imshow(target_X_img,cmap=cm.gray)
plt.plot(X_pts[:,0],X_pts[:,1],'bo',markersize=5)
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(input_Y_img,cmap=cm.gray)
plt.plot(Y_pts[:,0],Y_pts[:,1],'ro',markersize=5)
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(target_X_img,cmap=cm.gray)
plt.imshow(tr_Y_img,alpha=0.6,cmap=cm.gray)
plt.plot(X_pts[:,0],X_pts[:,1],'bo',markersize=5)
plt.plot(Z_pts[:,0],Z_pts[:,1],'ro',markersize=5) # same as...
plt.plot(tr_Y_pts[:,0],tr_Y_pts[:,1],'gx',markersize=5)
plt.axis('off')
plt.show()
All this only holds true, of course, for planar/rigid and affine transforms. As soon as you start having to cater for non-affine/perspective and deformable surfaces, well, that is a entirely different topic...

Related

How to align two Point Clouds given the set of points and point-to-point correspondence?

Suppose I have two pointclouds [x1, x2, x3...] and [y1, y2, y3...]. These two pointclouds should be as close as possible. There are a lot of algorithms and deep learning techniques for the pointcloud registration problems. But I have the extra information that: points x1 and y1 should be aligned, x2 and y2 should be aligned, and so on.
So the order of the points in both point clouds is the same. How can I use this to properly get the transformation matrix to align these two-point clouds?
Note: These two points clouds are not exactly the same. Actually, I had ground truth point cloud [x1,x2,x3...] and I tried to reconstruct another point cloud as [y1,y2,y3...]. Now I want to match them and visualize them if the reconstruction is good or not.
The problem you are facing is an overdetermined system of equations, which is solvable with a closed-form expression. No need for iterative methods like ICP, since you have the correspondence between points.
If you're looking for a rigid transform (that allows scaling, rotation and translation but no shearing), you want Umeyama's algorithm [3], which is a closed form as well, there is a Python implementation here: https://gist.github.com/nh2/bc4e2981b0e213fefd4aaa33edfb3893
If you are looking for an affine transform between your point clouds, i.e a linear transform A (that allows shearing, see [2]) as well as a translation t (which is not linear):
Then, each of your points must satisfy the equation:
y = Ax + t.
Here we assume the following shapes: y:(d,n), A:(d,d), x:(d,n), t:(d,1) if each cloud has n points in R^d.
You can also write it in homogeneous notation, by adding an extra coordinate, see [1]. This results in a linear system y=Mx, and a lot (assuming n>d) of pairs (x,y) that satisfy this equation (i.e. an overdetermined system).
You can therefore solve this using a closed-form least square method:
# Inputs:
# - P, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# - Q, a (n,dim) [or (dim,n)] matrix, a point cloud of n points in dim dimension.
# P and Q must be of the same shape.
# This function returns :
# - Pt, the P point cloud, transformed to fit to Q
# - (T,t) the affine transform
def affine_registration(P, Q):
transposed = False
if P.shape[0] < P.shape[1]:
transposed = True
P = P.T
Q = Q.T
(n, dim) = P.shape
# Compute least squares
p, res, rnk, s = scipy.linalg.lstsq(np.hstack((P, np.ones([n, 1]))), Q)
# Get translation
t = p[-1].T
# Get transform matrix
T = p[:-1].T
# Compute transformed pointcloud
Pt = P#T.T + t
if transposed: Pt = Pt.T
return Pt, (T, t)
Opencv has a function called getAffineTransform(), however it only takes 3 pairs of points as input. https://theailearner.com/tag/cv2-getaffinetransform/. This won't be robust for your case (if e.g. you give it the first 3 pairs of points).
References:
[1] https://web.cse.ohio-state.edu/~shen.94/681/Site/Slides_files/transformation_review.pdf#page=24
[2] https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html
[3] https://stackoverflow.com/a/32244818/4195725
As another user already mentioned, the ICP algorithm (implementation in PCL can be found here) can be used to register two point clouds to each other. However this only works locally, so the clouds have to be aligned first.
I don't think there is a global registration in PCL at the moment, but I've used OpenGR which has a PCL wrapper.
If you know for sure that x1 is near y1, x2 is near y2 etc. you can do a manual alignment which will be a lot faster than global alignment:
Translate 2nd cloud by vector y1-x1
Rotate vector y2-y1 into vector x2-x1
Then refine it using ICP.
This does not account for measurement errors, so using the matrix estimation above will be useful if your data is not 100% correct.
VTK's vtkLandmarkTransform also does the same thing, with support for RigidBody/Similarity/Affine transformation:
// need at least four pairs of points in sourcePoint and targetPoints,
// can pick more, but probably not too many
vtkLandmarkTransform landmarkTransform = new vtkLandmarkTransform();
landmarkTransform.SetSourceLandmarks(sourcePoints); // source is to be transformed to match the target
landmarkTransform.SetTargetLandmarks(targetPoints); // target stays still
landmarkTransform.SetMode(VTK_Modes.AFFINE);
landmarkTransform.Modified(); // do the calculation
landmarkTransform.GetMatrix(mtx);
// now you can apply the mtx to all points

how to plot three or even more dimensional multivariate gaussian distribution

In the study of machine learning and pattern recognition, we know that if a sample i has two dimensional feature like (length, weight), both of length and weight belongs to Gaussian distribution, so we can use a multivariate Gaussian distribution to describe it
it's just a 3d plot looks like this :
where z axis is the possibility ,
but what if this sample i has three dimensional features, x1, x2 , x3 ....xn or even more, how do we correctly plot it using one plot???
You can use dimensionality reduction methods to visualize higher dimensional data.
https://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py
convert D dimensional data into 2 or 3 dimensional data
plot the transformed data points on 2 or 3 data points depending upon the dimension to which the data was reduced.
Lets consider an example. Take 10th dimensional Gaussian
import matplotlib.pyplot as plt
import numpy as np
DIMENSION = 10
mean = np.zeros((DIMENSION,))
cov = np.eye(DIMENSION)
X = np.random.multivariate_normal(mean, cov, 5000)
Then perform dimensionality reduction (I used PCA, you can choose any other method depending upon the prior knowledge of effectiveness of the algorithm for a particular type of data)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
X_2d = PCA(n_components=2).fit_transform(X)
X_3d = PCA(n_components=3).fit_transform(X)
Then Plot them
fig = plt.figure(figsize=(12,4))
ax = fig.add_subplot(121, projection='3d')
ax.scatter(X_3d[:,0],X_3d[:,1],X_3d[:,2])
plt.title('3D')
fig.add_subplot(122)
plt.scatter(X_2d[:,0], X_2d[:,1])
plt.title('2D')
You can play with other algos as well. Each offers different kind of advantage.
I hope this answers your question.
Note: In higher dimension, phenomenon like "curse of dimensionality" also comes into play. so accurate projection in lower dimensional may not be possible. Something like why Greenland appears to be of similar size to that of Africa on cartographic map.

Matching a template image in CV2 with a different orientation

So I'm a very experienced developer trying to get into some machine learning/neural networking code.
Essentially I need a HUGE dataset so my first problem is that I need to find a way of labelling a lot of images quickly. So take this as the example.
I was thinking I could use template matching on the main image with the image below it? So that way I would simply need to get permission to use this data and I could label it very quickly.
When using openCV(from the examples) I get some very funky results which don't find the plate in the image, it does draw boxes but not around the plate, having tested it, it gets very very close a few times, but not much, code is...
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('./image2.jpg',0)
img2 = img.copy()
template = cv.imread('./Plate2.test.png',0)
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']
for meth in methods:
img = img2.copy()
method = eval(meth)
# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)
plt.show()
The first thing is I'm guessing this isn't working because the Main image we're looking for the template on is orientated differently.
The second thing I should point out is I am NOT a Python programmer so I'm learning this also, and this is my first time touching OpenCV so I'm trying to apply what I DO understanding about object detecting to things I don't properly understand.
What I want to do is get the coordinates for a bounding box in the MAIN image from the smaller plate that way I can(with permission) create a decent dataset to train really quick - otherwise, I have to do it manually :-(
Any help would be greatly appreciated, I have a lot of examples working but this was an interesting problem I didn't find any reading on.
In my mind the steps are:
1)Find the plate and create bounding box
2)Train the dataset across as many images a possible for object detection on said plates
3) When testing the plate needs extracting from the main image and then a perspective transform applying.
4) If you wanted to, then you'd do text extraction once you've got the plate flattened out.
UPDATE:
So I tried SIFT from here the results are as follows(note this image is already in the public domain from the above website.) - not quite on target!
UPDATE 2
I've managed to cobble together a solution from an article as suggested JD in the comments, essentially it lets me label enough images to create a neural network that in turn is much better at detecting them - I'll post an update soon with the answer.

Block scheme detection in text document

I have image of text document. It includes text and block-schemes. The main problem is to detect block-schemes. I think there are two approaches to solve this task: 1) detect geometric primitive that make up the scheme; 2) detect the whole scheme.
How can I solve this task, please, give me some aproaches.
UPDATE 1
I try to detect where in document block-scheme is placed. Example is shown on the picture below. I didn't try to detect text in block-scheme.
UPDATE 2 The main problem is that i should find block-schemes in different varieties. Even part of the block-scheme.
You can either do 1) Object Detection 2) Semantic Segmentation. I would suggest segmentation because boundary extraction is crucial for your application.
I'm assuming you have the pages of the documents as images.
The following are the steps involved in projects involving segmentation.
Dataset
Collect the images of the pages required to solve you problem and do
preprocessing steps such as image resizing to bring all images in
your dataset to a common shape and to reduce the number of computations performed. Be sure to maintain variability in your samples.
Now you have to annotate the regions of the images that you are interested and mark them with a name. Here assigning a class (like classification) to certain regions of the image. You can use the following tools for this.
Labelme -- (my recommendation)
Vgg Annotation tool -- (highly portable tool written in html but has less features than labelme)
Model
You can use U-Net Model for your task. Unet Paper. It is very easy to implement but performs very robustly on most real-world tasks such as yours.
We have done something similar at work. This is the blog post. We have explained in detail the steps involved in the pipe line from the data collection stage to the results.
Literature on Document Layout Analysis.
https://arxiv.org/pdf/1804.10371.pdf -- They have used U-Net with ResNet-50 as encoder. They have achieved very good results compared to previous approaches
https://github.com/leonlulu/DeepLayout-- This is a Python implementation of page layout analysis tool using a Deep Lab v2 model which does semantic segmentation.
Conclusion
The approach presented here might seem tedious and time consuming but it is robust to variability in the documents when you are testing. Comment below if you have any questions.
I would prefer if there were more examples for the types of diagram you are searching for, but based on the example you have given, here is my attempt of solving it naively.
1) Resize image to a manageable size to improve speed and reduce operations.
2) Use morphological open to cluster all the dark objects together.
3) Binarize the dark objects.
4) Label the objects using openCV connected components. This will give us the bounding box of each region.
5) Cluster overlapping bounding box together.
6) Analyze each bounding box to find the one with diagram. Here you can apply a more sophisticated algorithm like box detection or even arrow detection but in your example, i think a simple box ratio is sufficient.
Here is the code for the implementation
import cv2
import numpy as np
# Function to fill all the bounding box
def fill_rects(image, stats):
for i,stat in enumerate(stats):
if i > 0:
p1 = (stat[0],stat[1])
p2 = (stat[0] + stat[2],stat[1] + stat[3])
cv2.rectangle(image,p1,p2,255,-1)
# image name
img_name = 'test_image.png'
# Load image file
diagram = cv2.imread(img_name,0)
diagram = cv2.blur(diagram,(5,5))
fScale = 0.25
# Make it smaller to speed up everything and easier to cluster
small_img = cv2.resize(diagram,(0,0),fx = fScale, fy = fScale)
img_h, img_w = np.shape(small_img)
# Morphological close process to cluster nearby objects
fat_img = cv2.morphologyEx(small_img,cv2.MORPH_OPEN,None,iterations = 1)
# Threshold strong signals
_, bin_img = cv2.threshold(fat_img,210,255,cv2.THRESH_BINARY_INV)
# Analyse connected components
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(bin_img)
# Cluster all the intersected bounding box together
rsmall, csmall = np.shape(small_img)
new_img1 = np.zeros((rsmall, csmall), dtype=np.uint8)
fill_rects(new_img1,stats)
# Analyse New connected components to get filled regions
num_labels_new, labels_new, stats_new, centroids_new = cv2.connectedComponentsWithStats(new_img1)
# Check for regions that satifies conditions coresponds to diagram
min_dia_width = img_w * 0.1
dia_regions = []
for i ,stat in enumerate(stats):
if i > 0:
# get basic dimensions
x,y,w,h = stat[0:4]
# calculate ratio
ratio = w / float(h)
# if condition met, save in list
if ratio < 1 and w > min_dia_width:
dia_regions.append((x/fScale,y/fScale,w/fScale,h/fScale))
# For display purpose
diagram_disp = cv2.imread(img_name)
for region in dia_regions:
x,y,w,h = region
x = int(x)
y = int(y)
w = int(w)
h = int(h)
cv2.rectangle(diagram_disp,(x,y),(x+w,y+h),(0,255,0),2)
labels_disp = np.uint8(200*labels/np.max(labels)) + 50
labels_disp2 = np.uint8(200*labels_new/np.max(labels_new)) + 50
cv2.imshow('small_img',small_img)
cv2.imshow('fat_img',fat_img)
cv2.imshow('bin_img',bin_img)
cv2.imshow("labels",labels_disp)
cv2.imshow("labels_disp2",labels_disp2)
cv2.imshow("diagram_disp",diagram_disp)
cv2.waitKey(0)
Here is the result for another type of input.

Feature Vectors in Radial Basis Function Network

I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.plot(G)
plt.title('Radial Basis Functions')
plt.show()
# Simple training in this case: use pseudoinverse to get weights
weights = np.dot(np.linalg.pinv(G), y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict = np.dot(G_test, weights)
plt.plot(y_predict)
plt.title('Predicted function')
plt.show()
error = y_predict - y_test
plt.plot(error)
plt.title('Function approximation error')
plt.show()
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.

Resources