In an iOS app, I need to solve a linear equation Ax = B, where A is a sparse matrix with 40K rows and 10K columns.
Accelerate has a Sparse Solver package, but it is still in beta:
https://developer.apple.com/documentation/accelerate/sparse_solvers
I wonder if I can use BLAS to solve the linear equation? BLAS contains functions to define a sparse matrix, but I don't see any solver functions.
Related
Recently,I have been going through lectures and texts and trying to understand how SVM's enable use to work in higher dimensional space.
In normal logistic regression,we use the features as it is..but in SVM's we use a mapping which helps us attain a non linear decision boundary.
Normally we work directly with features..but with the help of kernel trick we can find relations in data using square of the features..product between them etc..is this correct?
We do this with the help of kernel.
Now..i understand that a polynomial kernel corresponds to a known feature vector..but i am unable to understand what gaussian kernel corresponds to( i am told an infinite dimensional feature vector..but what?)
Also,I am unable to grasp the concept that kernel is a measure of similiarity between training examples..how is this a part of the SVM's working?
I have spent lot of time trying to understand these..but in vain.Any help would be much apppreciated!
Thanks in advance :)
Normally we work directly with features..but with the help of kernel trick we can find relations in data using square of the features..product between them etc..is this correct?
Even using a kernel you still work with features, you can simply exploit more complex relations of these features. Such as in your example - polynomial kernel gives you access to low-degree polynomial relations between features (such as squares, or products of features).
Now..i understand that a polynomial kernel corresponds to a known feature vector..but i am unable to understand what gaussian kernel corresponds to( i am told an infinite dimensional feature vector..but what?)
Gaussian kernel maps your feature vector to the unnormalized Gaussian probability density function. In other words, you map each point onto a space of functions, where your point is now a Gaussian centered in this point (with variance corresponding to the hyperparameter gamma of the gaussian kernel). Kernel is always a dot product between vectors. In particular, in function spaces L2 we define classic dot product as an integral over the product, so
<f,g> = integral (f*g) (x) dx
where f,g are Gaussian distributions.
Luckily, for two Gaussian densities, integral of their product is also a Gaussian, this is why gaussian kernel is so similar to the pdf function of the gaussian distribution.
Also,I am unable to grasp the concept that kernel is a measure of similiarity between training examples..how is this a part of the SVM's working?
As mentioned before, kernel is a dot product, and dot product can be seen as a measure of similarity (it is maximized when two vectors have the same direction). However it does not work the other way around, you cannot use every similarity measure as a kernel, because not every similarity measure is a valid dot product.
just a bit of introduction about svm before i start answering the question. This will help you get overview about svm. Svm task is to find the best margin-maximing hyperplane that best separates the data. We have soft margin representation of svm which is also known as primal form and its equivalent form is dual form of svm. Dual form of svm makes the use of kernel trick.
kernel trick is partially replacing the feature engineering which is the most important step in machine learning when we have datasets that are not linear (eg. datasets in shape of concentric circles).
Now you can transform this dataset from non-linear to linear by both FE and kernel trick. By FE you can square each of the features in this dataset and it will transform into linear dataset and then you can apply techniques like logisitic regression which work best for linear data.
In kernel trick you can use the polynomial kernel whose general form is (a + x_i(transpose)x_j)^d where a and d are constants and d specifies the degree, suppose if the degree is 2 then we can say it is quadratic and likewise. now lets say we apply d =2 now our equation becomes (a + x_i(transpose)x_j)^2. lets the we 2 features in our original dataset (eg. for x_1 vector is [x_11,x__12] and x_2 the vector is [x_21,x_22]) now when we apply polynomial kernel on this we get we get 6-d vectors. Now we have transformed the features from 2-d to 6-d.Now you can see intuitively that higher your dimension of data better would svm work because it will eventually transform that features to a higher space. Infact the best case of svm is if you have high dimensionality, then go for svm.
Now you can see both kernel trick and Feature Engineering solve and transforms the dataset(concentric circle one) but the difference is we are doing FE explicitly but kernel trick comes implicitly with svm. There is also a general purpose kernel know as Radial basis function kernel which can be used when you don't know the kernel in advance.
RBF kernel has a parameter (sigma) if the value of sigma is set to 1, then you get a curve that looks like gaussian curve.
You can consider kernel as a similarity metric only and can interpret if the distance between the points is less, higher will be the similarity.
NLopt is a solver for optimization, which implements different optimization algorithms and is implemented in different languages.
In order to use the LD_LBFGS algorithm in Julia, does the variable have to be a vector as opposed to a matrix?
If yes, once we need to optimize an objective which is a univariate function of a matrix variable, do we have to vectorize the matrix to be able to use this package?
Yes, NLopt only understands vectors of decision variables. If your code is more naturally expressed in terms of matrices, then you should convert the vector into a matrix in the function and derivative evaluation callbacks using reinterpret.
Hello I am working on a project involving in face recognition for which I am using Linear Discriminant Analysis(LDA). LDA demands to find the generalized eigen vectors for the between class scatter matrix and with in class scatter matrix and that is where I am struck. I am using opencv with DevC++ for coding. Basically the problem looks like
A*v=lambda*B*v
where A and B are matrices for which generalized eigen vectors should be found
lambda is eigen values and v is vectors
Upon searching about this problem many people suggested to go for calculating the inverse of B and then multiplying with A*v
(inv(B)*A)*v=lambda*v
and then calculate eigen vectors for inv(B)*A.
It seems to be a good solution but in my case the scatter matrix B is almost sigular. I found its determinant is in the order of 10^-36 .So I cant find its inverse and proceed with the above solution. So Can some one suggest me a way to get out of this problem except saying to code for generalized eigen value problem separately.
I am providing a Fisherfaces implementation in my github repository at https://github.com/bytefish/opencv/tree/master/lda. This includes the implementation of an eigenvalue solver for general matrices, see: https://github.com/bytefish/opencv/blob/master/lda/include/decomposition.hpp (I've ported the great JAMA solver), which is exactely what you are looking for.
If you have problems with the code, please drop me a note on the projects page at http://www.bytefish.de/blog/fisherfaces_in_opencv.
I need to do matrix operations (mainly multiply and inverse) of a sparse matrix SparseMat in OpenCV.
I noticed that you can only iterate and insert values to SparseMat.
Is there an external code I can use? (or am I missing something?)
It's just that sparse matrices are not really suited for inversion or matrix-matrix-multiplication, so it's quite reasonable there is no builtin function for that. They're actually more used for matrix-vector multiplication (usually when solving iterative linear systems).
What you can do is solve N linear systems (with the columns of the identity matrix as right hand sides) to get the inverse matrix. But then you need N*N storage for the inverse matrix anyway, so using a dense matrix with a usual decompositions algorithm would be a better way to do it, as the performance gain won't be that high when doing N iterative solutions. Or maybe some sparse direct solvers like SuperLU or TAUCS may help, but I doubt that OpenCV has such functionalities.
You should also think if you really need the inverse matrix. Often such problem are also solvable by just solving a linear system, which can be done with a sparse matrix quite easily and fast via e.g. CG or BiCGStab.
You can convert a SparseMat to a Mat, do what operations you need and then convert back.
you can use Eigen library directly. Eigen works together with OpenCV very well.
I am using libsvm to train a SVM with hog features. The model file has n support vectors. But when I try to use it in OpenCV's SVM I found that there is only one vector in OpenCV's model. How does OpenCV do it??
I guess libsvm stores support vectors, whereas opencv just uses a weight vector to store the hyperplane (one vector + one scalar suffices to describe a plane) - you can get there from the decision function using the support vectors by swapping sum and scalar product.
Here is the explanation from Learning OpenCV3:
In the case of linear SVM, all the support vectors for each decision plane can be compressed into a single vector that will basically describe the separating hyperplane.