Cocke Younger Kasami (CYK) algorithm and computer vision - parsing

You may have seen the triangular matrix for syntactic analysis. Is there any implementation of the CYK Algorithm using computer vision?

Related

How to get all the local "peaks" from a picture?

I would like to get all the small peaks from the picture below. I would also like to avoid getting the big peak (on the right), but I can exclude that based on the area. The image is a result of Sobel operator but this is not important. It will be used as markers for watershed algorithm. This is not the only picture I have to process and I can't relay on the values of the peaks only on the fact that they are "local maximums". Notice that contour detection won't work as some small bumps are connected. The picture:
You may find Chapter 4 "Peaks and Ridges" of my thesis 1999 useful. It uses the concept of the watershed algorithm, implemented using Vincent's ordered pixel implementation. Algorithm details are in the thesis. I have not been working in the area for many years so have not done an opencv implementation.
#Alessandro Jacopson provides the following link in another SO watershed question :
VINCENT, Luc; SOILLE, Pierre. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE transactions on pattern analysis and machine intelligence, 1991, 13.6: 583-598.

What is the differences between normal equation and gradient descent for polynomial regression

I'm new to machine learning and willing to study and work with machine learning. It just that I still don't get to understand the benefits of using normal equation in some occasion in comparison with gradient descent. I use Andrew Ng's course on Coursera but the notation really makes me a hard time to understand.
I want to know more about the derivation of the cost function J(ō) for polynomial regression and the reason why he uses the transpose of vector x(i)
Normal Equation is beneficial in cases where we don't need to choose alpha. Also, it is an non-iterative algorithm, so you will be able to minimize the cost function within less time(only when number of features are less).
Since, normal equation works with matrices and its inverse, it is relatively a computationally expensive approach as compared to Gradient Descent approach in cases where number of features are large.
Refer to the below link for derivation of cost function for Polynomial regression :
Cost function for Polynomial regression

How can PCA be used for SIFT or VLAD vectors?

I'm reading a lot of papers about VLAD and Fisher Vectors (FV). In particular, in this paper (and essentially every paper talking about the topic) the authors use PCA for reducing SIFT, VLAD and FV dimensions.
However, from what I understand PCA involves computing the eigenvalues of the covariance matrix, and we can compute eigenvalues only for square matrices.
Now, supposing that we want to compute PCA for 1M SIFT vectors. How can we compute PCA on a 1Mx128 matrix?
My understanding from this question that SVD is an alternative, but I'm quite surprised since nobody in any papers have ever talked about PCA! Did I miss something?
Implementations of VLAD and Fisher Vectors effectively tend to use PCA to reduce the dimensionality of the image patches. Most papers report typical values are DIM=64 and using 1M patches which effectively makes it difficult to apply SVD directly.
I have seen implementations of PCA for SIFT which use the iterative algorithm reported here: https://en.wikipedia.org/wiki/Principal_component_analysis#Iterative_computation.

opencv Freak - Can I use Freak feature descriptor with BOW?

I am trying to use Freak in opencv to detect features and extract descriptors, then build my BOW vocabulary and for each image use the vocabulary to match with BOW. You know, the whole thing. I know BOW can be used with other descriptors like SIFT or SURF, it is not clear to me if Freak descriptors, which are binary, can be used with BOW. More specifically, when opencv builds a BOW vocabulary, it uses k-means cluster. It is not clear to me what distance function the k-means cluster algorithm uses. For binary descriptors like Freak, Hamming distance seems to be the only choice.
It looks to me opencv k-means only uses euclidean distance when calculating distance, bummer. Looks like I have to build my own k-means and my own vocabulary matching. Any smart people out there know a workaround?
Thanks!
I read on a paper that Freak is not easy to be used. Here is the excerpt form the paper "....These algorithms cannot be easily used in many retrieval algorithms because they must be compared with a Hamming distance, which is not easily adapted to accelerated search structures such as vocabulary trees or Approximate Nearest Neighbors (ANN)...."
(ORB ,FREAK and BRISK)
FREAK works with locality sensitive hashing. You can use it with FLANN (Fast approximate nearest neighbors) included in OpenCV.
For the BOW, only the first 5, 6, 7, 8 bytes of the descriptor might be sufficient to construct the tree.

What is the difference between K-means clustering and vector quantization?

What is the difference between K-means clustering and vector quantization?
They seem to be very similar.
I'm dealing with Hidden Markov Models and I need to extract symbols from feature vectors.
In order to extract symbols, do I do vector quantization or k-means clustering?
The way I understand it, K-means is one type of vector quantization.
The K-means algorithms is the specialization of the celebrated "Lloyd I" quantization algorithm to the case of empirical distributions. (cf. Lloyd)
The Lloyd I algorithm is proved to yield a sequence of quantizers with a decreasing quadratic distortion. However, except in the special case of one-dimensional log-concave distributions, it dos not always converge to a quadratic optimal quantizer. (There are local minimums for the quantization error, especially when dealing with empirical distribution i.e. for the clustering problem.)
A method that converges (always) toward an optimal quantizer is the so-called CLVQ algorithms, which also generalizes to the problem of more general L^p quantization. It is a kind of Stochastic Gradient method. (cf. Pagès)
There are also some approaches based on genetic algorithms. (cf. Hamida et al.), and/or classical optimization procedures for the one dimensional case that converge faster (Pagès, Printems).

Resources