opencv Freak - Can I use Freak feature descriptor with BOW? - opencv

I am trying to use Freak in opencv to detect features and extract descriptors, then build my BOW vocabulary and for each image use the vocabulary to match with BOW. You know, the whole thing. I know BOW can be used with other descriptors like SIFT or SURF, it is not clear to me if Freak descriptors, which are binary, can be used with BOW. More specifically, when opencv builds a BOW vocabulary, it uses k-means cluster. It is not clear to me what distance function the k-means cluster algorithm uses. For binary descriptors like Freak, Hamming distance seems to be the only choice.
It looks to me opencv k-means only uses euclidean distance when calculating distance, bummer. Looks like I have to build my own k-means and my own vocabulary matching. Any smart people out there know a workaround?
Thanks!

I read on a paper that Freak is not easy to be used. Here is the excerpt form the paper "....These algorithms cannot be easily used in many retrieval algorithms because they must be compared with a Hamming distance, which is not easily adapted to accelerated search structures such as vocabulary trees or Approximate Nearest Neighbors (ANN)...."
(ORB ,FREAK and BRISK)

FREAK works with locality sensitive hashing. You can use it with FLANN (Fast approximate nearest neighbors) included in OpenCV.
For the BOW, only the first 5, 6, 7, 8 bytes of the descriptor might be sufficient to construct the tree.

Related

OpenCV Face Verification

Is there way that I can implement Face Recognition using OpenCV? I tried to use LBPH, and train with one image. It gives a confidence score, but I am not sure how accurate this is to use for verification.
My question is how can I create a face recognition system that tells me how similar the two faces are/if they are the same person or not using OpenCV. It doesn't seem like the confidence score is an accurate measure, if I'm doing this correctly.
Also, is a higher confidence score better?
Thanks
OpenCV 3 currently support following algorithms for face recognition:
- Eigenfaces (see createEigenFaceRecognizer())
- Fisherfaces (see createFisherFaceRecognizer())
- Local Binary Patterns Histograms (see createLBPHFaceRecognizer())
Confidence score by these algorithms is the similarity measure between faces, but these methods are really old and perform poorly. I'd suggest you try this article : http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf
Basically you need to download trained caffe model from here: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/src/vgg_face_caffe.tar.gz
Use opencv to run this classifier like shown is this example:
http://docs.opencv.org/trunk/d5/de7/tutorial_dnn_googlenet.html#gsc.tab=0
Then collect fc8 feature layer of size 4096 floats from caffe network. And calculate your similarity as L2 norm between two fc8 layers calculated for your faces.

How to measure distance between Fisher Vector for Image Retrieval?

I've read something about Fisher Vector and I'm still in the learning process. It's a better representation than the classic BoF representation, exploiting GMM (or k-means, even if that's usually referred as VLAD).
However, I've seen that usually they are used for classification problem, for example with SVM.
But what about Image Retrieval? I've seen that they have been used for image retrieval too (here), but I don't understand one point: given two FV representing 2 images, how do we compute their distances and so "how similar the two images are?"
Is it reasonable to use them in such a context?
As seen in the two papers below, Euclidean distance seems to be the popular choice. There are also references to using dot-product as a similarity measure; cosine similarity (closely related) is a generally popular metric for ML similarity.
http://link.springer.com/article/10.1007/s11263-013-0636-x
http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf
Is this enough to let you choose something that meets your needs?

In scikit-learn, can DBSCAN use sparse matrix?

I got Memory Error when I was running dbscan algorithm of scikit.
My data is about 20000*10000, it's a binary matrix.
(Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method which don't need an initial cluster number)
Anyway I found sparse matrix and feature extraction of scikit.
http://scikit-learn.org/dev/modules/feature_extraction.html
http://docs.scipy.org/doc/scipy/reference/sparse.html
But I still have no idea how to use it. In DBSCAN's specification, there is no indication about using sparse matrix. Is it not allowed?
If anyone knows how to use sparse matrix in DBSCAN, please tell me.
Or you can tell me a more suitable cluster method.
The scikit implementation of DBSCAN is, unfortunately, very naive. It needs to be rewritten to take indexing (ball trees etc.) into account.
As of now, it will apparently insist of computing a complete distance matrix, which wastes a lot of memory.
May I suggest that you just reimplement DBSCAN yourself. It's fairly easy, there exists good pseudocode e.g. on Wikipedia and in the original publication. It should be just a few lines, and you can then easily take benefit of your data representation. E.g. if you already have a similarity graph in a sparse representation, it's usually fairly trivial to do a "range query" (i.e. use only the edges that satisfy your distance threshold)
Here is a issue in scikit-learn github where they talk about improving the implementation. A user reports his version using the ball-tree is 50x faster (which doesn't surprise me, I've seen similar speedups with indexes before - it will likely become more pronounced when further increasing the data set size).
Update: the DBSCAN version in scikit-learn has received substantial improvements since this answer was written.
You can pass a distance matrix to DBSCAN, so assuming X is your sample matrix, the following should work:
from sklearn.metrics.pairwise import euclidean_distances
D = euclidean_distances(X, X)
db = DBSCAN(metric="precomputed").fit(D)
However, the matrix D will be even larger than X: n_samplesĀ² entries. With sparse matrices, k-means is probably the best option.
(DBSCAN may seem attractive because it doesn't need a pre-determined number of clusters, but it trades that for two parameters that you have to tune. It's mostly applicable in settings where the samples are points in space and you know how close you want those points to be to be in the same cluster, or when you have a black box distance metric that scikit-learn doesn't support.)
Yes, since version 0.16.1.
Here's a commit for a test:
https://github.com/scikit-learn/scikit-learn/commit/494b8e574337e510bcb6fd0c941e390371ef1879
Sklearn's DBSCAN algorithm doesn't take sparse arrays. However, KMeans and Spectral clustering do, you can try these. More on sklearns clustering methods: http://scikit-learn.org/stable/modules/clustering.html

Is Dense SIFT better for Bag-Of-Words than SIFT?

I'm implementing a Bag-of-Words image classifier using OpenCV. Initially I've tested SURF descriptors extracted in SURF keypoints. I've heard that Dense SIFT (or PHOW) descriptors can work better for my purposes, so I tried them too.
To my surprise, they performed significantly worse, actually almost 10 times worse. What could I be doing wrong? I'm using DenseFeatureDetector from OpenCV to get keypoints. I'm extracting about 5000 descriptors per image from 9 layers and cluster them into 500 clusters.
Should I try PHOW descriptors from VLFeat library? Also I can't use chi square kernel in OpenCV's SVM implementation, which is recommended in many papers. Is this crucial to the classifier quality, should I try another library?
Another question is the scale invariance, I suspect that it can be affected by dense feature extraction. Am I right?
It depends on the problem. You should try different techniques in order to know what is the best technique to use on your problem. Usually using PHOW is very useful when you need to classify any kind of scene.
You should know that PHOW is a little bit different than just Dense SIFT. I used vlfeat PHOW a few years ago, and seeing the code, it is just calling dense sift with different sizes, and some smoothing. That could be one clue to be able to be invariant to scale.
Also in my experiments I used libsvm, and it resulted that histogram intersection was the best one for me. By default chi-square and histogram intersection kernels are not included in libsvm nor OpenCV SVM (based on libsvm). You are the one to decide if you should try them. I can tell you that RBF kernel achieved near 90% of accuracy, wheter histogram intersection 93%, and chi-square 91%. But those results were in my concrete experiments. You should start on RBF with autotuned params, and see if its enough.
Summarizing it all depends on your concrete experiments. But if you use Dense SIFT, maybe you could try to increase the number of clusters, and calling Dense SIFT with different scales (I recommend you the PHOW way).
EDIT: I was looking at OpenCV DenseSift, and maybe you could start with
m_detector=new DenseFeatureDetector(4, 4, 1.5);
Knowing thath vlfeat PHOW uses [4 6 8 10] as bin sizes.

Implementing Vocabulary Tree in OpenCV

I am trying to implement image search based on paper "Scalable Recognition with a Vocabulary Tree". I am using SURF for extracting the features and key points. For example, for an image i'm getting say 300 key points and each key point has 128 descriptor values. My Question is how can I apply the K-Means Clustering algorithm on the data. I mean Do I need to apply clustering algorithm on all the points i.e., 300*128 values or Do I need to find the distance between the consecutive descriptor values and store the values and apply the clustering algorithm on that. I am confused and any help will be appreciated.
Thanks,
Rocky.
From your question I would say you are quite confused. The vocabulary tree technique is grounded on the us of k-means hierarchical clustering and a TF-IDF weighting scheme for the leaf nodes.
In a nutshell the clustering algorithm employed for the vocabulary tree construction runs k-means once over all the d-dimensional data (d=128 for the case of SIFT) and then runs k-means again over each of the obtained clusters until some depth level. Hence the two main parameters for the vocabulary tree construction are the branching factor k and the tree depth L. Some improvements consider only the branching factor while the depth is automatically determined by cutting the tree to fulfill a minimum variance measure.
As for the implementation, cv::BOWTrainer from OpenCV is a good starting point though is not very well generalized for the case of a hierarchical BoW scheme since it imposes the centers to be stored in a simple cv::Mat while vocabulary tree is typically unbalanced and mapping it to a matrix in a level-wise fashion might not be efficient from the memory use point of view when the number of nodes is much lower than the theoretical number of nodes in a balanced tree with depth L and branching factor k, that is:
n << (1-k^L)/(1-k)
For what I know I think that you have to store all the descriptors on a cv::Mat and then add this to a "Kmeans Trainer", thus you can finally apply the clustering algorithm. Here a snippet that can give you an idea about what I am talking:
BOWKMeansTrainer bowtrainer(1000); //num clusters
bowtrainer.add(training_descriptors); // we add the descriptors
Mat vocabulary = bowtrainer.cluster(); // apply the clustering algorithm
And this maybe can be interesting to you: http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/
Good luck!!
Checkout out the code in libvot, in src/vocab_tree/clustering.*, you can find a detailed implementation of the clustering algorithm.

Resources