What are geometry-adaptive kernels? - machine-learning

I was reading a paper "CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly
Congested Scenes" and found this term but couldn't find the exact meaning of geometry-adaptive kernels.Can someone please explain me this?

Geometric algorithms are based on geometric objects such as points, lines and circles. The term kernel refers to a collection of representations for constant-size geometric objects and operations on these representations. The term adaptive kernel, in a broad sense, means that these representations are not immutable or fixed, but rather that they can vary according to some criteria.

Related

How to measure distance between Fisher Vector for Image Retrieval?

I've read something about Fisher Vector and I'm still in the learning process. It's a better representation than the classic BoF representation, exploiting GMM (or k-means, even if that's usually referred as VLAD).
However, I've seen that usually they are used for classification problem, for example with SVM.
But what about Image Retrieval? I've seen that they have been used for image retrieval too (here), but I don't understand one point: given two FV representing 2 images, how do we compute their distances and so "how similar the two images are?"
Is it reasonable to use them in such a context?
As seen in the two papers below, Euclidean distance seems to be the popular choice. There are also references to using dot-product as a similarity measure; cosine similarity (closely related) is a generally popular metric for ML similarity.
http://link.springer.com/article/10.1007/s11263-013-0636-x
http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf
Is this enough to let you choose something that meets your needs?

Bag of Features / Visual Words + Locality Sensitive Hashing

PREMISE:
I'm really new to Computer Vision/Image Processing and Machine Learning (luckily, I'm more expert on Information retrieval), so please be kind with this filthy peasant! :D
MY APPLICATION:
We have a mobile application where the user takes a photo (the query) and the system returns the most similar picture thas was previously taken by some other user (the dataset element). Time performances are crucial, followed by precision and finally by memory usage.
MY APPROACH:
First of all, it's quite obvious that this is a 1-Nearest Neighbor problem (1-NN). LSH is a popular, fast and relatively precise solution for this problem. In particular, my LSH impelementation is about using Kernalized Locality Sensitive Hashing to achieve a good precision to translate a d-dimension vector to a s-dimension binary vector (where s<<d) and then use Fast Exact Search in Hamming Space
with Multi-Index Hashing to quickly find the exact nearest neighbor between all the vectors in the dataset (transposed to hamming space).
In addition, I'm going to use SIFT since I want to use a robust keypoint detector&descriptor for my application.
WHAT DOES IT MISS IN THIS PROCESS?
Well, it seems that I already decided everything, right? Actually NO: in my linked question I face the problem about how to represent the set descriptor vectors of a single image into a vector. Why do I need it? Because a query/dataset element in LSH is vector, not a matrix (while SIFT keypoint descriptor set is a matrix). As someone suggested in the comments, the commonest (and most efficient) solution is using the Bag of Features (BoF) model, which I'm still not confident with yet.
So, I read this article, but I have still some questions (see QUESTIONS below)!
QUESTIONS:
First and most important question: do you think that this is a reasonable approach?
Is k-means used in the BoF algorithm the best choice for such an application? What are alternative clustering algorithms?
The dimension of the codeword vector obtained by the BoF is equal to the number of clusters (so k parameter in the k-means approach)?
If 2. is correct, bigger is k then more precise is the BoF vector obtained?
There is any "dynamic" k-means? Since the query image must added to the dataset after the computation is done (remember: the dataset is formed by the images of all submitted queries) the cluster can change in time.
Given a query image, is the process to obtain the codebook vector the same as the one for a dataset image, e.g. we assign each descriptor to a cluster and the i-th dimension of the resulting vector is equal to the number of descriptors assigned to the i-th cluster?
It looks like you are building codebook from a set of keypoint features generated by SIFT.
You can try "mixture of gaussians" model. K-means assumes that each dimension of a keypoint is independent while "mixture of gaussians" can model the correlation between each dimension of the keypoint feature.
I can't answer this question. But I remember that the SIFT keypoint, by default, has 128 dimensions. You probably want a smaller number of clusters like 50 clusters.
N/A
You can try Infinite Gaussian Mixture Model or look at this paper: "Revisiting k-means: New Algorithms via Bayesian Nonparametrics" by Brian Kulis and Michael Jordan!
Not sure if I understand this question.
Hope this help!

Topic model as a dimension reduction method for text mining -- what to do next?

My understanding of the work flow is to run LDA -> Extract keywards (e.g. the top few words for each topics), and hence reduce dimension -> some subsequent analysis.
My question is, if my overall purpose is to give topic to articles in an unsupervised way, or clustering similar documents together, then a running of LDA will take you directly to the goal. Why do you reduce the dimension and then pass it to subsequent analysis? If you do, what sort of subsequent analysis can you do after LDA?
Also, a bit unrelated question -- is it better to ask this question here or at cross validated?
I think cross validated is a better place for these kinds of questions. Anyhow, there are simple explanations about why we need dimension reduction:
Without dimension reduction, vector operations are not computable. Imagine a dot product between two vector with dimension in size of your dictionary! really?
Each number carry more dense amount of information after reducing the dimension. Which it usually leads to less noise. Intuitively, you only kept useful information.
You should rethink your approach, since you are mixing probabilistic methods (LDA) with Linear Algebra (dimensional reduction). When you feel more comfortable with Linear Algebra, consider Non Negative Matrix Factorisation.
Also note that your topics already constitute the reduced dimensions, there is no need to jump back to the extracted top words in the topics.

Gaussian Weighting for point re-distribution

I am working with some points which are very compact together and therefore forming clusters amongst them is proving very difficult. Since I am new to this concept, I read in a paper about the concept of Gaussian weighting the points randomly or rather resampling using gaussian weight.
My question here is how are gaussian weight applied to the data points? Is it the actual normal distribution where I have to compute the means and the variance and SD and than randomly sample or there is other ways to do it. I am confused on this concept?
Can I get some hints on the concept please
I think you should look at book:
http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738
There is a good chapters on modeling point distributions.

Implementing Vocabulary Tree in OpenCV

I am trying to implement image search based on paper "Scalable Recognition with a Vocabulary Tree". I am using SURF for extracting the features and key points. For example, for an image i'm getting say 300 key points and each key point has 128 descriptor values. My Question is how can I apply the K-Means Clustering algorithm on the data. I mean Do I need to apply clustering algorithm on all the points i.e., 300*128 values or Do I need to find the distance between the consecutive descriptor values and store the values and apply the clustering algorithm on that. I am confused and any help will be appreciated.
Thanks,
Rocky.
From your question I would say you are quite confused. The vocabulary tree technique is grounded on the us of k-means hierarchical clustering and a TF-IDF weighting scheme for the leaf nodes.
In a nutshell the clustering algorithm employed for the vocabulary tree construction runs k-means once over all the d-dimensional data (d=128 for the case of SIFT) and then runs k-means again over each of the obtained clusters until some depth level. Hence the two main parameters for the vocabulary tree construction are the branching factor k and the tree depth L. Some improvements consider only the branching factor while the depth is automatically determined by cutting the tree to fulfill a minimum variance measure.
As for the implementation, cv::BOWTrainer from OpenCV is a good starting point though is not very well generalized for the case of a hierarchical BoW scheme since it imposes the centers to be stored in a simple cv::Mat while vocabulary tree is typically unbalanced and mapping it to a matrix in a level-wise fashion might not be efficient from the memory use point of view when the number of nodes is much lower than the theoretical number of nodes in a balanced tree with depth L and branching factor k, that is:
n << (1-k^L)/(1-k)
For what I know I think that you have to store all the descriptors on a cv::Mat and then add this to a "Kmeans Trainer", thus you can finally apply the clustering algorithm. Here a snippet that can give you an idea about what I am talking:
BOWKMeansTrainer bowtrainer(1000); //num clusters
bowtrainer.add(training_descriptors); // we add the descriptors
Mat vocabulary = bowtrainer.cluster(); // apply the clustering algorithm
And this maybe can be interesting to you: http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/
Good luck!!
Checkout out the code in libvot, in src/vocab_tree/clustering.*, you can find a detailed implementation of the clustering algorithm.

Resources