I'm trying to find a measurement for the similarity of 2 faces. I use OpenCV. For that I train Eigenfaces / Fisherfaces with 1000 Photos of 1000 different people (so 1 Photo each person). So I also have 1000 labels in the training set.
Now I can use the predict method to get the most similar face.
I want to input 2 unknown face images to find if they are both similar to the same vector of faces in the training set.
Here is the code of openCV that returns the most similar label (with the lowest distance).
for(size_t sampleIdx = 0; sampleIdx < _projections.size(); sampleIdx++) {
double dist = norm(_projections[sampleIdx], q, NORM_L2);
if((dist < minDist) && (dist < _threshold)) {
minDist = dist;
minClass = _labels.at<int>((int)sampleIdx);
}
Questions:
Can anyone tell me how to rewrite this to output the top 10 faces and not just the top 1 ? I'm thinking about pushing them into a priority queue, but maybe there is something easier?!
In the training: should I put all the faces on the same label or on different labels? So should I have 1 label or 1000 ?
Cheers
Here's what I did. Note I'm really good at perl, really newb at C++ (in fact, this is my first c++ project!) so I output a lot to the command line and parsed it with perl.
I went to facerec.cpp as you did, and I changed the contents of the for loop to this:
for(size_t sampleIdx = 0; sampleIdx < _projections.size(); sampleIdx++) {
double dist = norm(_projections[sampleIdx], q, NORM_L2);
int labelClass = _labels.at<int>((int)sampleIdx);
cout << dist << " " << labelClass << endl;
if((dist < minDist) && (dist < _threshold)) {
minDist = dist;
minClass = _labels.at<int>((int)sampleIdx);
}
}
This now outputs the distance and label of every face. Since all the predict function appears to do is take the picture with the shortest distance (lowest number) and return that as the answer, you can now take the resulting list, sort it, and take the first 10 results. Or you can take the first ten labels or whatever. This just gives you access to all of the data rather than the first X results.
I also added
#include <iostream>
using namespace std;
to the top of the file so I could use cout.
Q1:: Since OpenCV doesn't provide a default function, you have to create your own by creating a vector which has distance and label. You can write your own function as below and store the distance and label in the vector. Here you need to rebuild the opencv.
virtual void predict(InputArray src, int &label, double &confidence, Vector <variable>) const = 0;
Related
I try to triangulate key points around the camera but only a few points reach OpenCV method triangulatePoints() as inliers through pipeline. I use OpenCV 4.5.0.
My pipeline is:
i'm creating the feature detector using SURF, which detects the keypoints and computes the descriptors
detector_ = cv::xfeatures2d::SURF::create(400, 4, 2, false, false);
detector_->detectAndCompute(img, cv::noArray(), keyPoints, descriptors);
i'm matching the keypoints from two adjacent images using 2 candidates and filtering out those of them where two candidates are too close.
i'm using fundamental matrix to find essential because using of findEssentialMat leads to strange results at the end of the pipeline and i read somewhere about 5-points algorithm unreliability.
for (int i = 0; i < (int) goodMatches.size(); i++) {
prevSurvivors.push_back(keyPoints1[goodMatches[i].queryIdx].pt);
curSurvivors.push_back(keyPoints2[goodMatches[i].trainIdx].pt);
}
fundamental_matrix_ = cv::findFundamentalMat(prevSurvivors, curSurvivors, outputMask, cv::FM_RANSAC);
i'm removing those points from prevSurvivors and curSurvivors for which outputMask has 0. After that i'm calculating essential matrix
essential_matrix_ = INTRINSICS.t() * fundamental_matrix_ * INTRINSICS;
Finally i'm checking the rank of the essential matrix and calling the recoverPose method.
bool hasEssentialMatGoodRank = hasEssentialMatrixAppropriateRank();
if (hasEssentialMatGoodRank) {
outputMask.release();
cv::recoverPose(essential_matrix_, prevSurvivors, curSurvivors, INTRINSICS, R, t, outputMask);
}
What i see is the outputMask which may have 50 inliers for frames N and N+1, but 0 for frames N+1 and N+2. It breaks my pipeline and i can't understand why.
Frames N, N+1 and N+2
N and N+1
N+1 and N+2
The questions are:
what i'm doing wrong with pipeline or
which options, algorithms or methods i should change to get better results
Say I have a very simple image or shape such as this stick man drawing:
I also have a library of other simple images which I want to compare the first image to and determine the closest match:
Notice that the two stick men are not completely identical but are reasonably similar.
I want to be able to compare the first image to each image in my library until a reasonably close match is found. If necessary, my image library could contain numerous variations of the same image in order to help decide which type of image I have. For example:
My question is whether this is something that OpenCV would be capable of? Has it been done before, and if so, can you point me in the direction of some examples? Many thanks for your help.
Edit: Through my searches I have found many examples of people who are comparing images, or even people that are comparing images which have been stretched or skewed such as this: Checking images for similarity with OpenCV . Unfortunately as you can see, my images are not just translated (Rotated/Skewed/Stretched) versions of one another - They actually different images although they are very similar.
You should be able to do it using feature template match function of OpenCV. You can use matchTemplate function to look for the feature and then, minMaxLoc to find its location. Check out the tutorial on OpenCV web site for matchTemplate.
seems you need feature points detections and matching. Check these docs from OpenCV:
http://docs.opencv.org/doc/tutorials/features2d/feature_detection/feature_detection.html
http://docs.opencv.org/doc/tutorials/features2d/feature_flann_matcher/feature_flann_matcher.html
For your particular type of images, you might get good results by using moments/HuMoments for the connected components (which you can find with findContours).
since there is a rotation involved, I dont think template matching would work well. You probably need to use Feature point detection such as SIFT or SURF.
EDIT: This won't work with rotation. Same for matchTemplate. I am yet to try the findContours + moments as in bjoernz answer which sounds promising.
Failed Solution:
I tried using ShapeContextDistanceExtractor(1) available in OpenCV 3.0 along with findContours on your sample images to get good results. The sample images were cropped to same size as original image(128*200). You can could as well use resize in OpenCV.
Code below compares images in images folder with 1.png as the base image.
#include "opencv2/shape.hpp"
#include "opencv2/opencv.hpp"
#include <iostream>
#include <string>
using namespace std;
using namespace cv;
const int MAX_SHAPES = 7;
vector<Point> findContours( const Mat& compareToImg )
{
vector<vector<Point> > contour2D;
findContours(compareToImg, contour2D, RETR_LIST, CHAIN_APPROX_NONE);
//converting 2d vector contours to 1D vector for comparison
vector <Point> contour1D;
for (size_t border=0; border < contour2D.size(); border++) {
for (size_t p=0; p < contour2D[border].size(); p++) {
contour1D.push_back( contour2D[border][p] );
}
}
//limiting contours size to reduce distance comparison time
contour1D.resize( 300 );
return contour1D;
}
int main()
{
string path = "./images/";
cv::Ptr <cv::ShapeContextDistanceExtractor> distanceExtractor = cv::createShapeContextDistanceExtractor();
//base image
Mat baseImage= imread( path + "1.png", IMREAD_GRAYSCALE);
vector<Point> baseImageContours= findContours( baseImage );
for ( int idx = 2; idx <= MAX_SHAPES; ++idx ) {
stringstream imgName;
imgName << path << idx << ".png";
Mat compareToImg=imread( imgName.str(), IMREAD_GRAYSCALE ) ;
vector<Point> contii = findContours( compareToImg );
float distance = distanceExtractor->computeDistance( baseImageContours, contii );
std::cout<<" distance to " << idx << " : " << distance << std::endl;
}
return 0;
}
Result
distance to 2 : 89.7951
distance to 3 : 14.6793
distance to 4 : 6.0063
distance to 5 : 4.79834
distance to 6 : 0.0963184
distance to 7 : 0.00212693
Do three things: 1. Forget about image comparison since you really comparing stroke symbols. 2. Download and play wth a Gesture Search app from google store; 3. Realize that for good performance you cannot recognize your strokes without using timestamp information about stroke drawing. Otherwice we would have a successful handwriting recognition. Then you can research Android stroke reco library to write your code properly.
I am using opencv-2.4 (CvSVM) for classification. For each test data it is predicting one class as predicted output. But I need to find the next class which is more close to the test data.
Is there any way to find that in opencv SVM classifier ??
Unfortunately, you can not do it directly with the current interface.
One solution would be to use the library libsvm instead.
You may do it in opencv, but it will require a little bit of work.
First, you must know that OpenCV uses a "1-against-1" strategy for multi-class classification.
For a N-class problem, it will train N*(N-1)/2 binary classifier (one for each couple of classes), and then uses a majority vote to choose the most probable class.
You will have to apply each classifier, and do the majority yourself to get what you want.
The code below show you how to do that with OpenCV 3 (warning: it is untested, probably contains errors, but it gives you a good starting point).
Ptr<SVM> svm;
int N; //number of classes
Mat data; //input data to classify
Mat sv=svm->getSupportVectors();
Ptr<Kernel> kernel=svm->getKernel();
Mat buffer(1,sv.rows,CV_32F);
kernel->calc(sv.rows, sv.cols , sv.ptr<float>(), data.ptr<float>(), buffer.ptr<float>()); // apply kernel on data (CV_32F vector) and support vectors
Mat alpha, svidx;
vector<int> votes(N, 0); // results of majority vote will be stored here
int i, j, dfi;
for( i = dfi = 0; i < N; i++ )
{
for( j = i+1; j < N; j++, dfi++ )
{
// compute score for each binary svm
double rho=svm->getDecisionFunction(dfi, alpha, svidx);
double sum = -rho;
for( k = 0; k < sv.rows; k++ )
sum += alpha.at<float>(k)*buffer.at<float>(svidx.at<int>(k));
// majority vote
votes[sum > 0 ? i : j]++;
}
}
Edit: This code is adapted from the internal code of Opencv here.
It is incorrect, as pointed out by David Doria in the comments, since there is no getKernel function defined in the SVM class. I still leave it here, since it should'nt be too difficult to modify the internal OpenCV code to add it, and there is apparently no other way to do it.
I'm having an issue trying to perform a two dimensional transform on an array of floats using cuFFT. I've had a look at the documentation, but some of the information is contradictory/not clear; so I have a few questions:
My data is 480 rows, with 640 columns (e.g. float data[480][640] but in a single dimension so float data[480*640])
If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?
Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be acufftComplex array?
If it must be acufftComplex array, I am assuming the elements need to be in the place of the real components?
What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.
I think that's all for now...
Many thanks in advance
EDIT: So, I've implemented the Forward and Inverse transforms as suggested by JackOLantern. However my results are not what I am expecting (an identical Result after FFT as Before it). I have an image gallery here showing two sets of examples. The first is from my room, the second from my University Project.
In the cuFFT Documentation, there is ambiguity in the use of cufftPlan2d (hence why I asked). In the documentation, for a two dimensional array, the data should be input as above (float data[480][640] == float data[NY][NX]) So NY represents the rows. However in the function listing for cufftPlan2d, it states that nx (the parameter) is for the rows...
Swapping the values of NX and NY in the function call gives the result as in the project image (correct orientation, but split into three partially overlapping images at 1/4 the normal size) however, using the parameters as JackOLantern states in his answer gives a slanted/skewed result.
Am I doing something wrong here? Or does the cuFFT library have issues with this type of thing.
ALSO: I have undone a couple of the edits made by JackOLantern to this question as my issues MAY stem from the fact my data is coming from OpenCV.
EDIT: I've recently found out that I was the one who made a mistake in the way I used the function.
Originally I though the function definition referred to the size of the data being passed into it.
However, it appears that the parameters actually refer directly to the size of the REAL part.
This means that the parameters refer to:
The size of the input data when using R2C (Real to Complex)
The size of the output data when using C2R (Complex to Real)
So it appears that the cuFFT documentation and the library itself do not correspond.
When performing an R2C followed by a C2R (real to complex, complex to real respectively), the documentation states that for a Real input of NX x NY dimensions, the Complex output is NX x (floor(NY/2) +1); and vice versa.
However the actual output is of dimensions NX x NY and the actual input is of dimensions NX x NY. This is (half) mentioned on the very first page as
C2R - Symmetric complex input to real output
Implying that the complex data must be Symmetric, i.e. must also have the redundant data in addition to the non-redundant data.
There are a number of other contradictions within the documentation as well which I won't go into.
Needless to say, the problem has been solved.
I have included a MWE below. Near the top are a couple of lines with #define NUM_C2 and appropriate comments. Changing this changes whether the documentation format is followed, or my "fix".
The output is
The Input Real data
The Intermediate Complex data
The output Real data
The ratio of the output data to the input data (there are minor FFT errors, ~1 indicates correct)
Feel free to change the parameters (NUM_R and NUM_C) and feel free to comment if you think I have made a mistake somewhere.
#include <iostream>
#include <math.h>
#include <cufft.h>
// e.g. float data[NUM_R][NUM_C]
#define NUM_R 12
#define NUM_C 16
// Documentation Version
//#define NUM_C2 (1+NUM_C/2)
// "Correct" Version
#define NUM_C2 NUM_C
using namespace std;
int main(int argc, char** argv)
{
cufftReal *in_h, *out_h, *in_d, *out_d;
cufftComplex *mid_d, *mid_h;
cufftHandle pF, pI;
int r, c;
in_h = (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
out_h= (cufftReal*) malloc(NUM_R * NUM_C * sizeof(cufftReal));
mid_h= (cufftComplex*)malloc(NUM_C2*NUM_R*sizeof(cufftComplex));
cudaMalloc((void**) &in_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&out_d, NUM_R * NUM_C * sizeof(cufftReal));
cudaMalloc((void**)&mid_d, NUM_C2 * NUM_R * sizeof(cufftComplex));
cufftPlan2d(&pF, NUM_R, NUM_C, CUFFT_R2C);
cufftPlan2d(&pI, NUM_R,NUM_C2, CUFFT_C2R);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
in_h[c + NUM_C * r] = cos(2.0*M_PI*(c*7.0/NUM_C+r*3.0/NUM_R));
out_h[c+ NUM_C * r] = 0.f;
cout<<in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
cudaMemcpy((cufftReal*)in_d, (cufftReal*)in_h, NUM_R * NUM_C * sizeof(cufftReal),cudaMemcpyHostToDevice);
cufftExecR2C(pF, (cufftReal*)in_d, (cufftComplex*)mid_d);
cudaMemcpy((cufftComplex*)mid_h, (cufftComplex*)mid_d, NUM_C2*NUM_R*sizeof(cufftComplex), cudaMemcpyDeviceToHost);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C2; c++)
{
cout<<mid_h[c+(NUM_C2)*r].x<<"|"<<mid_h[c+(NUM_C2)*r].y;
if(c<(NUM_C2-1)) cout<<", ";
else cout<<endl;
}
}
cufftExecC2R(pI, (cufftComplex*)mid_d, (cufftReal*)out_d);
cudaMemcpy((cufftReal*)out_h, (cufftReal*)out_d, NUM_R*NUM_C*sizeof(cufftReal), cudaMemcpyDeviceToHost);
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<out_h[c+NUM_C*r]/(NUM_R*NUM_C);
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
cout<<endl<<"------"<<endl;
for(r=0; r<NUM_R; r++)
{
for(c=0; c<NUM_C; c++)
{
cout<<(out_h[c+NUM_C*r]/(NUM_R*NUM_C))/in_h[c+NUM_C*r];
if(c<(NUM_C-1)) cout<<", ";
else cout<<endl;
}
}
free(in_h);
free(out_h);
free(mid_h);
cudaFree(in_d);
cudaFree(out_h);
cudaFree(mid_d);
return 0;
}
1) If we say my input dimensions (of real data) are N1 = 480 and N2 = 640. Are the dimensions (after a real to complex transform) N1=480, N2=321?
The output of cufftExecR2C is a NX*(NY/2+1) cufftComplex matrix. So in your case, you will have a 480x321 float2 matrix as output.
2) Can I cudaMemcpy the data directly into a cufftReal array of the same size? Or must it be a cufftComplex array?
If it must be a cufftComplex array, I am assuming the elements need to be in the place of the real components?
Yes, you can copy the data to a cufftReal array and the N1xN2 data.
3) What is the correct structure of a call to cufftPlan2d, cufftExecR2C and cufftC2R given the above values.
cufftPlan2d(&plan, N1, N2, CUFFT_R2C);
cufftExecR2C(plan, (cufftReal*)idata, (cufftComplex*) odata);
Currently, I'm working on a project in medical engineering. I have a big image with several sub-images of the cell, so my first task is to divide the image.
I thought about the next thing:
Convert the image into binary
doing a projection of the brightness pixels into the x-axis so I can see where there are gaps between brightnesses values and then divide the image.
The problem comes when I try to reach the second part. My idea is using a vector as the projection and sum all the brightnesses values all along one column, so the position number 0 of the vector is the sum of all the brightnesses values that are in the first column of the image, the same until I reach the last column, so at the end I have the projection.
This is how I have tried:
void calculo(cv::Mat &result,cv::Mat &binary){ //result=the sum,binary the imag.
int i,j;
for (i=0;i<=binary.rows;i++){
for(j=0;j<=binary.cols;j++){
cv::Scalar intensity= binaria.at<uchar>(j,i);
result.at<uchar>(i,i)=result.at<uchar>(i,i)+intensity.val[0];
}
cv::Scalar intensity2= result.at<uchar>(i,i);
cout<< "content" "\n"<< intensity2.val[0] << endl;
}
}
When executing this code, I have a violation error. Another problem is that I cannot create a matrix with one unique row, so...I don't know what could I do.
Any ideas?! Thanks!
At the end, it does not work, I need to sum all the pixels in one COLUMN. I did:
cv::Mat suma(cv::Mat& matrix){
int i;
cv::Mat output(1,matrix.cols,CV_64F);
for (i=0;i<=matrix.cols;i++){
output.at<double>(0,i)=norm(matrix.col(i),1);
}
return output;
}
but It gave me a mistake:
Assertion failed (0 <= colRange.start && colRange.start <= colRange.end && colRange.end <= m.cols) in Mat, file /home/usuario/OpenCV-2.2.0/modules/core/src/matrix.cpp, line 276
I dont know, any idea would be helpful, anyway many thanks mevatron, you really left me in the way.
If you just want the sum of the binary image, you could simply take the L1-norm. Like so:
Mat binaryVectorSum(const Mat& binary)
{
Mat output(1, binary.rows, CV_64F);
for(int i = 0; i < binary.rows; i++)
{
output.at<double>(0, i) = norm(binary.row(i), NORM_L1);
}
return output;
}
I'm at work, so I can't test it out, but that should get you close.
EDIT : Got home. Tested it. It works. :) One caveat...this function works if your binary matrix is truly binary (i.e., 0's and 1's). You may need to scale the norm output with the maximum value if the binary matrix is say 0's and 255's.
EDIT : If you don't have using namespace cv; in your .cpp file, then you'll need to declare the namespace to use NORM_L1 like this cv::NORM_L1.
Have you considered transposing the matrix before you call the function? Like this:
sumCols = binaryVectorSum(binary.t());
vs.
sumRows = binaryVectorSum(binary);
EDIT : A bug with my code :)
I changed:
Mat output(1, binary.cols, CV_64F);
to
Mat output(1, binary.rows, CV_64F);
My test case was a square matrix, so that bug didn't get found...
Hope that is helpful!