I have a folder of images of a car from every angle. I want to use the bag of words approach to train the system in recognizing the car. Once the training is done, I want that if an image of that car is given it should be able to recognize it.
I have been trying to learn the BOW function in opencv in order to make this work and have come at a level where I do not know what to do now and some guidance would be appreciated.
Here is my code that I used to make the bag of words:
Ptr<FeatureDetector> features = FeatureDetector::create("SIFT");
Ptr<DescriptorExtractor> descriptors = DescriptorExtractor::create("SIFT");
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");
//defining terms for bowkmeans trainer
TermCriteria tc(MAX_ITER + EPS, 10, 0.001);
int dictionarySize = 1000;
int retries = 1;
int flags = KMEANS_PP_CENTERS;
BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
BOWImgDescriptorExtractor bowDE(descriptors, matcher);
//training data now
Mat features;
Mat img = imread("c:\\1.jpg", 0);
Mat img2 = imread("c:\\2.jpg", 0);
vector<KeyPoint> keypoints, keypoints2;
features->detect(img, keypoints);
features->detect(img2,keypoints2);
descriptor->compute(img, keypoints, features);
Mat features2;
descripto->compute(img2, keypoints2, features2);
bowTrainer.add(features);
bowTrainer.add(features2);
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
This is all based on the BOW documentation.
I think at this stage my system is trained. and the next step is predicting.
this is where I dont know what to do. If I use SVM or NormalBayesClassifier they both use the terms train and predict.
How do I predict and train after this? any guidance would be much appreciated. How do I connect the training of the classifier to my `bowDE`` function?
Your next step is to extract the actual bag of word descriptors. You can do this using the compute function from the BOWImgDescriptorExtractor. Something like
bowDE.compute(img, keypoints, bow_descriptor);
Using this function you create descriptors which you then gather into a matrix which serves as the input for the classifier functions. Maybe this tutorial can guide you a little bit.
Another thing I would like to mention is, that for classification you usually need at least 2 classes. So you also need some images which do not contain cars to train a classifier.
Related
I use SIFT to detect, describe feature points in two images as follows.
void FeaturePointMatching::SIFTFeatureMatchers(cv::Mat imgs[2], std::vector<cv::Point2f> fp[2])
{
cv::SiftFeatureDetector dec;
std::vector<cv::KeyPoint>kp1, kp2;
dec.detect(imgs[0], kp1);
dec.detect(imgs[1], kp2);
cv::SiftDescriptorExtractor ext;
cv::Mat desp1, desp2;
ext.compute(imgs[0], kp1, desp1);
ext.compute(imgs[1], kp2, desp2);
cv::BruteForceMatcher<cv::L2<float> > matcher;
std::vector<cv::DMatch> matches;
matcher.match(desp1, desp2, matches);
std::vector<cv::DMatch>::iterator iter;
fp[0].clear();
fp[1].clear();
for (iter = matches.begin(); iter != matches.end(); ++iter)
{
//if (iter->distance > 1000)
// continue;
fp[0].push_back(kp1.at(iter->queryIdx).pt);
fp[1].push_back(kp2.at(iter->trainIdx).pt);
}
// remove outliers
std::vector<uchar> mask;
cv::findFundamentalMat(fp[0], fp[1], cv::FM_RANSAC, 3, 1, mask);
std::vector<cv::Point2f> fp_refined[2];
for (size_t i = 0; i < mask.size(); ++i)
{
if (mask[i] != 0)
{
fp_refined[0].push_back(fp[0][i]);
fp_refined[1].push_back(fp[1][i]);
}
}
std::swap(fp_refined[0], fp[0]);
std::swap(fp_refined[1], fp[1]);
}
In the above code, I use findFundamentalMat() to remove outliers, but in the result img1 and img2 there are still some bad matches. In the images, each green line connects the matched feature point pair. And please ignore red marks. I can not find anything wrong, could anyone give me some hints? Thanks in advance.
RANSAC is just one of the robust estimators. In principle, one can use a variety of them but RANSAC has been shown to work quite well as long as your input data is not dominated by outliers. You can check other variants on RANSAC like MSAC, MLESAC, MAPSAC etc. which have some other interesting properties as well. You may find this CVPR presentation interesting (http://www.imgfsr.com/CVPR2011/Tutorial6/RANSAC_CVPR2011.pdf)
Depending on the quality of the input data, you can estimate the optimal number of RANSAC iterations as described here (https://en.wikipedia.org/wiki/RANSAC#Parameters)
Again, it is one of the robust estimator methods. You may take other statistical approaches like modelling your data with heavy tail distributions, trimmed least squares etc.
In your code you are missing the RANSAC step. RANSAC has basically 2 steps:
generate hypothesis (do a random selection of data points necessary to fit your mode: training data).
model evaluation (evaluate your model on the rest of the points: testing data)
iterate and choose the model that gives the lowest testing error.
RANSAC stands for RANdom SAmple Consensus, it does not remove outliers, it selects a group of points to calculate the fundamental matrix for that group of points. Then you need to do a re projection using the fundamental matrix just calculated with RANSAC to remove the outliers.
Like any algorithm, ransac is not perfect. You can try to run other disponible (in the opencv implementation) robust algorithms, like LMEDS. And you can reiterate, using the last points marked as inliers as input to a new estimation. And you can vary the threshold and confidence level. I suggest run ransac 1 ~ 3 times and then run LMEDS, that does not need a threshold, but only work well with at least +50% of inliers.
And, you can have geometrical order problems:
*If the baseline between the two stereo is too small the fundamental matrix estimation can be unreliable, and may be better to use findHomography() instead, for your purpose.
*if your images have some barrel/pincushin distortion, they are not in conformity with epipolar geometry and the fundamental matrix are not the correct mathematical model to link matches. In this case, you may have to calibrate your camera and then run undistort() and then process the output images.
I am trying to extract different point descriptors (SIFT, SURF, ORB, BRIEF,...) to build Bag of Visual words. The problem seems to be that I am using very small images : 12x60px.
Using a dense extractor I am able to get some keypoints, but then when extracting the descriptor no data is extracted.
Here is the code :
vector<KeyPoint> points;
Mat descriptor; // descriptor of the current image
Ptr<DescriptorExtractor> extractor = DescriptorExtractor::create("BRIEF");
Ptr<FeatureDetector> detector(new DenseFeatureDetector(1.f,1,0.1f,6,0,true,false));
image = imread(filename, 0);
roi = Mat(image,Rect(0,0,12,60));
detector->detect(roi,points);
extractor->compute(roi,points,descriptor);
cout << descriptor << endl;
The result is [] (with BRIEF and ORB) and SegFault (with SURF and SIFT).
Does anyone have a clue on how to densely extract point descriptors from small images on OpenCV ?
Thanks for your help.
Indeed, I finally managed to work my way to a solution. Thanks for the help.
I am now using an Orb detector with initalised parameters instead of a random one, e.g:
Ptr<DescriptorExtractor> extractor(new ORB(500, 1.2f, 8, orbSize, 0, 2, ORB::HARRIS_SCORE, orbSize));
I had to explore the documentation of OpenCV thoroughly before finding the answer to my problem : Orb documentation.
Also if people are using the dense point extractor they should be aware that after the descriptor computing process they may have less keypoints than produced by the keypoint extractor. The descriptor computing removes any keypoints for which it cannot get the data.
BRIEF and ORB use a 32x32 patch to get the descriptor. Since it doesn't fit your image, they remove those keypoints (to avoid returning keypoints without descriptor).
In the case of SURF and SIFT, they can use smaller patches, but it depends on the scale provided by the keypoint. In this case, I guess they have to use a bigger patch and the same as before happens. I don't know why you get a segfault, though; maybe the SIFT/SURF descriptor extractors don't check that keypoints are inside the image boundaries, as BRIEF/ORB ones do.
I have feature vector of size 100 .Total training samples are 500 in which there are 10 samples of each class.I want to design a separate svm classifier for each class.That is classifier of each class will be fed with 10 positive(for that class) and 490 negative instances.
My opencv code is as follows
For training:
Mat trainingDataMat(500, 100, CV_32FC1, trainingData);//trainingData is 2D MATRIX
Mat labelsMat(500, 1, CV_32FC1, labels);//10 positive and 490 -ve labels
CvSVMParams params;
params.svm_type = SVM::C_SVC;
params.kernel_type = SVM::RBF;
CvSVM SVM;
SVM.train_auto(trainingDataMat, labelsMat, Mat(), Mat(), params,5);
SVM.save(name);
For testing
Mat sampleMat(1, size, CV_32FC1, testing_vector);// testing_vector is 1D vector
CvSVM SVM;
SVM.load(name);
float response = SVM.predict(sampleMat);
The problem is that the classifier for class outputs -1 even when I give positive testing sample from the training set and same is the case for other testing samples.
I also tried ONE_CLASS svm but it gives 0 for every testing sample.
Where am I going wrong or what svm type should I use?Please explain with code if possible.
Thank you in advance.
It seems you've missed the normalization step. SVM classifier in OpenCV uses the same as libsvm, and if you read the documentation of libsvm it says you should normalize your train data in the interval [-1,1] and get scale parameters. Then use those scale parameters to scale your test data. This might be the one problem. Or it can be because of non-equivalent number of positive and negative samples. Did tried to classify your train data as a cross validation, after you have trained the SVM?
Try use linear kernel and approximately equal positives and negatives, for each class. You can ajust precision/recall by setting values of gamma and cost parameters. Take a look at: The gamma and cost parameter of SVM
This is a silly question since I'm quite new to SVM,
I've managed to extract features and locations using OpenCV's HoGDescriptor:
vector< float > features;
vector< Point > locations;
hog_descriptors.compute( image, features, Size(0, 0), Size(0, 0), locations );
Then I proceed to use CvSVM to train the SVM based on the features I've extracted.
Mat training_data( features );
CvSVM svm;
svm.train( training_data, labels, Mat(), Mat(), params );
Which gave me an error:
OpenCV Error: Bad argument (There is only a single class) in cvPreprocessCategoricalResponses, file /opt/local/var/macports/build/
My question is that, how do I convert the vector < features > into appropriate matrix to be fed into CvSVM ? Obviously I am doing something wrong, the OpenCV's tutorial shows that a 2D matrix containing the training data is fed into SVM. So, how do I convert vector < features > into a 2D matrix, what are the values in the 2nd dimension ?
What are these features exactly ? Are they the 9 bins consisting of normalized magnitude histograms ?
I found out the issue, since I was testing whether it is correct to pass feature vectors into the SVM in order to train it, I didn't bother to prepare both negative and positive samples.
Yet, CvSVM requires at least 2 different classes for training, that's why the error it threw.
Thanks a lot anyway !
I am trying to extract features using OpenCV's HoG API, however I can't seem to find the API that allow me to do that.
What I am trying to do is to extract features using HoG from all my dataset (a set number of positive and negative images), then train my own SVM.
I peeked into HoG.cpp under OpenCV, and it didn't help. All the codes are buried within complexities and the need to cater for different hardwares (e.g. Intel's IPP)
My question is:
Is there any API from OpenCV that I can use to extract all those features / descriptors to be fed into a SVM ? If there's how can I use it to train my own SVM ?
If there isn't, are there any existing libraries out there, which could accomplish the same thing ?
So far, I am actually porting an existing library (http://hogprocessing.altervista.org/) from Processing (Java) to C++, but it's still very slow, with detection taking around at least 16 seconds
Has anyone else successfully to extract HoG features, how did you go around it ? And do you have any open source codes which I could use ?
Thanks in advance
You can use hog class in opencv as follows
HOGDescriptor hog;
vector<float> ders;
vector<Point> locs;
This function computes the hog features for you
hog.compute(grayImg, ders, Size(32, 32), Size(0, 0), locs);
The HOG features computed for grayImg are stored in ders vector to make it into a matrix, which can be used later for training.
Mat Hogfeat(ders.size(), 1, CV_32FC1);
for(int i=0;i<ders.size();i++)
Hogfeat.at<float>(i,0)=ders.at(i);
Now your HOG features are stored in Hogfeat matrix.
You can also set the window size, cell size and block size by using object hog as follows:
hog.blockSize = 16;
hog.cellSize = 4;
hog.blockStride = 8;
// This is for comparing the HOG features of two images without using any SVM
// (It is not an efficient way but useful when you want to compare only few or two images)
// Simple distance
// Consider you have two HOG feature vectors for two images Hogfeat1 and Hogfeat2 and those are same size.
double distance = 0;
for(int i = 0; i < Hogfeat.rows; i++)
distance += abs(Hogfeat.at<float>(i, 0) - Hogfeat.at<float>(i, 0));
if (distance < Threshold)
cout<<"Two images are of same class"<<endl;
else
cout<<"Two images are of different class"<<endl;
Hope it is useful :)
I also wrote the program of 2 hog feature comparing with the help of the above article.
And I apply this method to check ROI region changing or not.
Please refer to the page here.
source code and simple introduction
Here is GPU version as well.
cv::Mat temp;
gpu::GpuMat gpu_img, descriptors;
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);
gpu_img.upload(img);
gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
descriptors.download(temp);
OpenCV 3 provides some changes to the way GPU algorithms (i.e. CUDA) can be used by the user, see the Transition Guide - CUDA.
To update the answer from user3398689 to OpenCV 3, here is a snipped code:
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaimgproc.hpp>
[...]
/* Suppose you load an image in a cv::Mat variable called 'src' */
int img_width = 320;
int img_height = 240;
int block_size = 16;
int bin_number = 9;
cv::Ptr<cv::cuda::HOG> cuda_hog = cuda::HOG::create(Size(img_width, img_height),
Size(block_size, block_size),
Size(block_size/2, block_size/2),
Size(block_size/2, block_size/2),
bin_number);
/* The following commands are optional: default values applies */
cuda_hog->setDescriptorFormat(cuda::HOG::DESCR_FORMAT_COL_BY_COL);
cuda_hog->setGammaCorrection(true);
cuda_hog->setWinStride(Size(img_width_, img_height_));
cv::cuda::GpuMat image;
cv::cuda::GpuMat descriptor;
image.upload(src);
/* May not apply to you */
/* CUDA HOG works with intensity (1 channel) or BGRA (4 channels) images */
/* The next function call convert a standard BGR image to BGRA using the GPU */
cv::cuda::GpuMat image_alpha;
cuda::cvtColor(image, image_alpha, COLOR_BGR2BGRA, 4);
cuda_hog->compute(image_alpha, descriptor);
cv::Mat dst;
image_alpha.download(dst);
You can then use the descriptors in 'dst' variable as you prefer like, e.g., as suggested by G453.