I am trying to extract features using OpenCV's HoG API, however I can't seem to find the API that allow me to do that.
What I am trying to do is to extract features using HoG from all my dataset (a set number of positive and negative images), then train my own SVM.
I peeked into HoG.cpp under OpenCV, and it didn't help. All the codes are buried within complexities and the need to cater for different hardwares (e.g. Intel's IPP)
My question is:
Is there any API from OpenCV that I can use to extract all those features / descriptors to be fed into a SVM ? If there's how can I use it to train my own SVM ?
If there isn't, are there any existing libraries out there, which could accomplish the same thing ?
So far, I am actually porting an existing library (http://hogprocessing.altervista.org/) from Processing (Java) to C++, but it's still very slow, with detection taking around at least 16 seconds
Has anyone else successfully to extract HoG features, how did you go around it ? And do you have any open source codes which I could use ?
Thanks in advance
You can use hog class in opencv as follows
HOGDescriptor hog;
vector<float> ders;
vector<Point> locs;
This function computes the hog features for you
hog.compute(grayImg, ders, Size(32, 32), Size(0, 0), locs);
The HOG features computed for grayImg are stored in ders vector to make it into a matrix, which can be used later for training.
Mat Hogfeat(ders.size(), 1, CV_32FC1);
for(int i=0;i<ders.size();i++)
Now your HOG features are stored in Hogfeat matrix.
You can also set the window size, cell size and block size by using object hog as follows:
hog.blockSize = 16;
hog.cellSize = 4;
hog.blockStride = 8;
// This is for comparing the HOG features of two images without using any SVM
// (It is not an efficient way but useful when you want to compare only few or two images)
// Simple distance
// Consider you have two HOG feature vectors for two images Hogfeat1 and Hogfeat2 and those are same size.
double distance = 0;
for(int i = 0; i < Hogfeat.rows; i++)
distance += abs(Hogfeat.at<float>(i, 0) - Hogfeat.at<float>(i, 0));
if (distance < Threshold)
cout<<"Two images are of same class"<<endl;
cout<<"Two images are of different class"<<endl;
Hope it is useful :)
I also wrote the program of 2 hog feature comparing with the help of the above article.
And I apply this method to check ROI region changing or not.
Please refer to the page here.
source code and simple introduction
Here is GPU version as well.
cv::Mat temp;
gpu::GpuMat gpu_img, descriptors;
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
OpenCV 3 provides some changes to the way GPU algorithms (i.e. CUDA) can be used by the user, see the Transition Guide - CUDA.
To update the answer from user3398689 to OpenCV 3, here is a snipped code:
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaimgproc.hpp>
/* Suppose you load an image in a cv::Mat variable called 'src' */
int img_width = 320;
int img_height = 240;
int block_size = 16;
int bin_number = 9;
cv::Ptr<cv::cuda::HOG> cuda_hog = cuda::HOG::create(Size(img_width, img_height),
Size(block_size, block_size),
Size(block_size/2, block_size/2),
Size(block_size/2, block_size/2),
/* The following commands are optional: default values applies */
cuda_hog->setWinStride(Size(img_width_, img_height_));
cv::cuda::GpuMat image;
cv::cuda::GpuMat descriptor;
/* May not apply to you */
/* CUDA HOG works with intensity (1 channel) or BGRA (4 channels) images */
/* The next function call convert a standard BGR image to BGRA using the GPU */
cv::cuda::GpuMat image_alpha;
cuda::cvtColor(image, image_alpha, COLOR_BGR2BGRA, 4);
cuda_hog->compute(image_alpha, descriptor);
cv::Mat dst;
You can then use the descriptors in 'dst' variable as you prefer like, e.g., as suggested by G453.
Greetings for the past week (or more) I've been struggling with a problem.
I am developing an app which will allow an expert to create a recipe using a provided image of something to be used as a base. The recipe consists of areas of interests. The program's purpose is to allow non experts to use it, providing images similar to that original and the software cross checks these different areas of interest from the Recipe image to the Provided image.
One use-case scenario could be banknotes. The expert would select an area on an a good picture of a banknote that is genuine, and then the user would provide the software with images of banknotes that need to be checked. So illumination, as well as capturing device could be different.
I don't want you guys to delve into the nature of comparing banknotes, that's another monster to tackle and I got it covered for the most part.
My Problem:
Initially I shrink one of the two pictures to the size of the smaller one.
So now we are dealing with pictures having the same size. (I actually perform the shrinking to the areas of interest and not the whole picture, but that shouldn't matter.)
I have tried and used different methodologies compare these parts but each one had it's limitations due to the nature of the images. Illumination might be different, provided image might have some sort of contamination etc.
What have I tried:
Simple image similarity comparison using RGB difference.
Problem is provided image could be totally different but colours could be similar. So I would get high percentages on "totally" different banknotes.
SSIM on RGB Images.
Would give really low percentage of similarity on all channels.
SSIM after using sobel filter.
Again low percentage of similarity.
I used SSIM from both Scikit in python and SSIM from OpenCV
Feature matching with Flann.
Couldn't find a good way to use detected matches to extract a similarity.
Basically I am guessing that I need to use various methods and algorithms to achieve the best result. My gut tells me that I will need to combine RGB comparison results with a methodology that will:
Perform some form of edge detection like sobel.
Compare the results based on shape matching or something similar.
I am an image analysis newbie and I also tried to find a way to compare, the sobel products of the provided images, using mean and std calculations from openCV, however I either did it wrong, or the results I got were useless anyway. I calculated the eucledian distance between the vectors that resulted from mean and std calculation, however I could not use the results mainly because I couldn't see how they related between images.
I am not providing code I used, firslty because I scrapped some of it, and secondly because I am not looking for a code solution but a methodology or some direction to study-material. (I've read shitload of papers already).
Finally I am not trying to detect similar images, but given two images, extract the similarity between them, trying to bypass small differences created by illumination or paper distortion etc.
Finally I would like to say that I tested all the methods by providing the same image twice and I would get 100% similarity, so I didn't totally fuck it up.
Is what I am trying even possible without some sort of training sets to teach the software what are the acceptable variants of the image? (Again I have no idea if that even makes sense :D )
I think you can try Feature Matching, like SURF alogrithm, FLANN
Example of Feature Detection using SURF : https://docs.opencv.org/3.0-beta/doc/tutorials/features2d/feature_detection/feature_detection.html
#include <stdio.h>
#include <iostream>
#include "opencv2/core.hpp"
#include "opencv2/features2d.hpp"
#include "opencv2/xfeatures2d.hpp"
#include "opencv2/highgui.hpp"
using namespace cv;
using namespace cv::xfeatures2d;
void readme();
/** #function main */
int main( int argc, char** argv )
if( argc != 3 )
{ readme(); return -1; }
Mat img_1 = imread( argv[1], IMREAD_GRAYSCALE );
Mat img_2 = imread( argv[2], IMREAD_GRAYSCALE );
if( !img_1.data || !img_2.data )
{ std::cout<< " --(!) Error reading images " << std::endl; return -1; }
//-- Step 1: Detect the keypoints using SURF Detector
int minHessian = 400;
Ptr<SURF> detector = SURF::create( minHessian );
std::vector<KeyPoint> keypoints_1, keypoints_2;
detector->detect( img_1, keypoints_1 );
detector->detect( img_2, keypoints_2 );
//-- Draw keypoints
Mat img_keypoints_1; Mat img_keypoints_2;
drawKeypoints( img_1, keypoints_1, img_keypoints_1, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
drawKeypoints( img_2, keypoints_2, img_keypoints_2, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
//-- Show detected (drawn) keypoints
imshow("Keypoints 1", img_keypoints_1 );
imshow("Keypoints 2", img_keypoints_2 );
return 0;
/** #function readme */
void readme()
{ std::cout << " Usage: ./SURF_detector <img1> <img2>" << std::endl; }
Ok after some digging around, this is what I came with :
import numpy as np
import cv2
import sys
import matplotlib.image as mpimg
from skimage import io
from skimage import measure
import time
s = 0
imgA = cv2.imread(sys.argv[1])
imgB = cv2.imread(sys.argv[2])
#imgA = cv2.imread('imageA.bmp')
#imgB = cv2.imread('imageB.bmp')
imgA = cv2.cvtColor(imgA, cv2.COLOR_BGR2GRAY)
imgB = cv2.cvtColor(imgB, cv2.COLOR_BGR2GRAY)
ret,imgA = cv2.threshold(imgA,127,255,0)
ret,imgB = cv2.threshold(imgB,127,255,0)
imgAContours, contoursA, hierarchyA = cv2.findContours(imgA, cv2.RETR_TREE , cv2.CHAIN_APPROX_NONE)
imgBContours, contoursB, hierarchyB = cv2.findContours(imgB, cv2.RETR_TREE , cv2.CHAIN_APPROX_NONE)
imgAContours = cv2.drawContours(imgAContours,contoursA,-1,(0,0,0),1)
imgBContours = cv2.drawContours(imgBContours,contoursB,-1,(0,0,0),1)
imgAContours = cv2.medianBlur(imgAContours,5)
imgBContours = cv2.medianBlur(imgBContours,5)
#s = 100 * 1/(1+cv2.matchShapes(imgAContours,imgBContours,cv2.CONTOURS_MATCH_I2,0.0))
#s = measure.compare_ssim(imgAContours,imgBContours)
#equality = np.equal(imgAContours,imgBContours)
total = 0.0
sum = 0.0
for x in range(len(imgAContours)):
for y in range(len(imgAContours[x])):
total +=1
t = imgAContours[x,y] == imgBContours[x,y]
if t:
s = (sum/total) * 100
Basically I preprocess the two images as simply as possible, then I find the contours. Now the matchShapes function from openCV was not giving me the results I wanted.
So I create two images using the information from the contours, and then I apply a median blur filter.
Currently, I am doing a simply boolean check pixel to pixel. However I am planning to change this in the future, making it smarter. Probably with some array math.
If anyone has any suggestions, they are welcome.
I am trying to extract different point descriptors (SIFT, SURF, ORB, BRIEF,...) to build Bag of Visual words. The problem seems to be that I am using very small images : 12x60px.
Using a dense extractor I am able to get some keypoints, but then when extracting the descriptor no data is extracted.
Here is the code :
vector<KeyPoint> points;
Mat descriptor; // descriptor of the current image
Ptr<DescriptorExtractor> extractor = DescriptorExtractor::create("BRIEF");
Ptr<FeatureDetector> detector(new DenseFeatureDetector(1.f,1,0.1f,6,0,true,false));
image = imread(filename, 0);
roi = Mat(image,Rect(0,0,12,60));
cout << descriptor << endl;
The result is [] (with BRIEF and ORB) and SegFault (with SURF and SIFT).
Does anyone have a clue on how to densely extract point descriptors from small images on OpenCV ?
Thanks for your help.
Indeed, I finally managed to work my way to a solution. Thanks for the help.
I am now using an Orb detector with initalised parameters instead of a random one, e.g:
Ptr<DescriptorExtractor> extractor(new ORB(500, 1.2f, 8, orbSize, 0, 2, ORB::HARRIS_SCORE, orbSize));
I had to explore the documentation of OpenCV thoroughly before finding the answer to my problem : Orb documentation.
Also if people are using the dense point extractor they should be aware that after the descriptor computing process they may have less keypoints than produced by the keypoint extractor. The descriptor computing removes any keypoints for which it cannot get the data.
BRIEF and ORB use a 32x32 patch to get the descriptor. Since it doesn't fit your image, they remove those keypoints (to avoid returning keypoints without descriptor).
In the case of SURF and SIFT, they can use smaller patches, but it depends on the scale provided by the keypoint. In this case, I guess they have to use a bigger patch and the same as before happens. I don't know why you get a segfault, though; maybe the SIFT/SURF descriptor extractors don't check that keypoints are inside the image boundaries, as BRIEF/ORB ones do.
I have a folder of images of a car from every angle. I want to use the bag of words approach to train the system in recognizing the car. Once the training is done, I want that if an image of that car is given it should be able to recognize it.
I have been trying to learn the BOW function in opencv in order to make this work and have come at a level where I do not know what to do now and some guidance would be appreciated.
Here is my code that I used to make the bag of words:
Ptr<FeatureDetector> features = FeatureDetector::create("SIFT");
Ptr<DescriptorExtractor> descriptors = DescriptorExtractor::create("SIFT");
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");
//defining terms for bowkmeans trainer
TermCriteria tc(MAX_ITER + EPS, 10, 0.001);
int dictionarySize = 1000;
int retries = 1;
int flags = KMEANS_PP_CENTERS;
BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
BOWImgDescriptorExtractor bowDE(descriptors, matcher);
//training data now
Mat features;
Mat img = imread("c:\\1.jpg", 0);
Mat img2 = imread("c:\\2.jpg", 0);
vector<KeyPoint> keypoints, keypoints2;
features->detect(img, keypoints);
descriptor->compute(img, keypoints, features);
Mat features2;
descripto->compute(img2, keypoints2, features2);
Mat dictionary = bowTrainer.cluster();
This is all based on the BOW documentation.
I think at this stage my system is trained. and the next step is predicting.
this is where I dont know what to do. If I use SVM or NormalBayesClassifier they both use the terms train and predict.
How do I predict and train after this? any guidance would be much appreciated. How do I connect the training of the classifier to my `bowDE`` function?
Your next step is to extract the actual bag of word descriptors. You can do this using the compute function from the BOWImgDescriptorExtractor. Something like
bowDE.compute(img, keypoints, bow_descriptor);
Using this function you create descriptors which you then gather into a matrix which serves as the input for the classifier functions. Maybe this tutorial can guide you a little bit.
Another thing I would like to mention is, that for classification you usually need at least 2 classes. So you also need some images which do not contain cars to train a classifier.
This is a silly question since I'm quite new to SVM,
I've managed to extract features and locations using OpenCV's HoGDescriptor:
vector< float > features;
vector< Point > locations;
hog_descriptors.compute( image, features, Size(0, 0), Size(0, 0), locations );
Then I proceed to use CvSVM to train the SVM based on the features I've extracted.
Mat training_data( features );
CvSVM svm;
svm.train( training_data, labels, Mat(), Mat(), params );
Which gave me an error:
OpenCV Error: Bad argument (There is only a single class) in cvPreprocessCategoricalResponses, file /opt/local/var/macports/build/
My question is that, how do I convert the vector < features > into appropriate matrix to be fed into CvSVM ? Obviously I am doing something wrong, the OpenCV's tutorial shows that a 2D matrix containing the training data is fed into SVM. So, how do I convert vector < features > into a 2D matrix, what are the values in the 2nd dimension ?
What are these features exactly ? Are they the 9 bins consisting of normalized magnitude histograms ?
I found out the issue, since I was testing whether it is correct to pass feature vectors into the SVM in order to train it, I didn't bother to prepare both negative and positive samples.
Yet, CvSVM requires at least 2 different classes for training, that's why the error it threw.
Thanks a lot anyway !
Please can somebody show me sample code or tell me how to use this class and methods.
I just want to match SURF's from a query image to those with an image set by applying Flann. I have seen many image match code in the samples but what still eludes me is a metric to quantify how similar an image is to other. Any help will be much appreciated.
Here's untested sample code
using namespace std;
using namespace cv;
Mat query; //the query image
vector<Mat> images; //set of images in your db
/* ... get the images from somewhere ... */
vector<vector<KeyPoint> > dbKeypoints;
vector<Mat> dbDescriptors;
vector<KeyPoint> queryKeypoints;
Mat queryDescriptors;
/* ... Extract the descriptors ... */
FlannBasedMatcher flannmatcher;
//train with descriptors from your db
vector<DMatch > matches;
flannmatcher.match(queryDescriptors, matches);
/* for kk=0 to matches.size()
the best match for queryKeypoints[matches[kk].queryIdx].pt
is dbKeypoints[matches[kk].imgIdx][matches[kk].trainIdx].pt
Finding the most 'similar' image to the query image depends on your application. Perhaps the number of matched keypoints is adequate. Or you may need a more complex measure of similarity.
To reduce the number of false positives, you can compare the first most nearest neighbor to the second most nearest neighbor by taking the ratio of there distances.
distance(query,mostnearestneighbor)/distance(query,secondnearestneighbor) < T, the smaller the ratio is, the higher the distance of the second nearest neighbor to the query descriptor. This thus is a translation of high distinctiveness. Used in many computer vision papers that envision registration.