Matrix optimization in NLopt - vectorization

NLopt is a solver for optimization, which implements different optimization algorithms and is implemented in different languages.
In order to use the LD_LBFGS algorithm in Julia, does the variable have to be a vector as opposed to a matrix?
If yes, once we need to optimize an objective which is a univariate function of a matrix variable, do we have to vectorize the matrix to be able to use this package?

Yes, NLopt only understands vectors of decision variables. If your code is more naturally expressed in terms of matrices, then you should convert the vector into a matrix in the function and derivative evaluation callbacks using reinterpret.

Related

Direct Transcription of nonlinear system with cost function dependent on K matrices returned by time-varying LQR

I'm working on implementing a trajectory optimization algorithm named DIRTREL, which is essentially direct transcription with an added cost function. However, the cost function incorporates the K matrices obtained by linearizing the system around the decision variables (x, u) and employing discrete time-varying LQR. My question is how to most efficiently and concisely express this in drake as my current approach describes the system symbolically and results in extremely lengthly symbolic equations (which will only increase in length with more timesteps) due to the recursive nature of the Riccati difference equation, and if this symbolic approach is even appropriate.
For more details:
Specify my system as a LeafSystem
Declare a MathematicalProgram with decision variables x, u
To obtain time-varying linearized dynamics, specify a class that takes in the dynamics and decision variables at a single timestep and returns Jacobians for that timestep with symbolic.Jacobian(args)
Add cost function which takes in the entire trajectory, so all x, u
Inside the cost function:
Obtain linearized matrices A_i, B_i, G_i (G_i for noise) for each timestep by using the class that takes in decision variables and returns Jacobians
Compute the TVLQR cost (S[n]) with the Riccati difference equations employing the A_i and B_i matrices and solving for Ks
return a cost for the mathematical program that is essentially a large linear combination of the K matrices
One side note is I am not sure of the most tractable way to compute an inverse symbolically, but I am most concerned with my methodology and whether this symbolic description is appropriate.
I think there are several details on DIRTREL worth discussion:
The cost-to-go matrix S[n] depends on the linearized dynamics Ai, Bi. I think in DIRTREL you will need to solve a nonlinear optimization problem, which requires the gradient of the cost. So to compute the gradient of of your cost, you will need the gradient of S[n], which requires the gradient of Ai, Bi. Since Ai and Bi are gradient of the dynamics function f(x, u), you will need to compute the second order gradient of the dynamics.
We had a paper on doing trajectory optimization and optimizing the cost function related to the LQR cost-to-go. DIRTREL made several improvement upon our paper. In our implementation, we treated S also as a decision variable, so our decision variables are x, u, S, with the constraint include both the dynamics constraint x[n+1] = f(x[n], u[n]), and the Riccati equation as constraint on S. I think DIRTREL's approach scales better with less decision variables, but I haven't compared the numerical performance between the two approaches.
I am not sure why you need to compute the inverse symbolically. First what is the inverse you need to compute? And second, Drake supports using automatic differentiation to compute the gradient in the numerical value. I would recommend doing numerical computation instead of symbolic computation. Since in numerical optimization, you only need the value and gradient of the cost/constraints, it is usually much more efficient to compute these values numerically, rather than first deriving the symbolic expression, and then evaluating the symbolic expression.

Parameters Equivalence between scikit-learn and OpenCV (Decision Tree)

I'm trying to convert an implementation of scikit-learn to OpenCV of several Machine Learning algorithms.
First of all, do you know of any specific question/document where I can find the parameters equivalence?
If not, in the specific case of Decision Trees, is the max_categories of OpenCv the equivalent of max_features in scikit-learn?
in the specific case of Decision Trees, is the max_categories of OpenCV the equivalent of max_features in scikit-learn?
It is not.
From the OpenCV docs:
maxCategories – Cluster possible values of a categorical variable into [...]
while scikit-learn does not even support directly categorical variables as predictors.

Custom kernels for SVM, when to apply them?

I am new to machine learning field and right now trying to get a grasp of how the most common learning algorithms work and understand when to apply each one of them. At the moment I am learning on how Support Vector Machines work and have a question on custom kernel functions.
There is plenty of information on the web on more standard (linear, RBF, polynomial) kernels for SVMs. I, however, would like to understand when it is reasonable to go for a custom kernel function. My questions are:
1) What are other possible kernels for SVMs?
2) In which situation one would apply custom kernels?
3) Can custom kernel substantially improve prediction quality of SVM?
1) What are other possible kernels for SVMs?
There are infinitely many of these, see for example list of ones implemented in pykernels (which is far from being exhaustive)
https://github.com/gmum/pykernels
Linear
Polynomial
RBF
Cosine similarity
Exponential
Laplacian
Rational quadratic
Inverse multiquadratic
Cauchy
T-Student
ANOVA
Additive Chi^2
Chi^2
MinMax
Min/Histogram intersection
Generalized histogram intersection
Spline
Sorensen
Tanimoto
Wavelet
Fourier
Log (CPD)
Power (CPD)
2) In which situation one would apply custom kernels?
Basically in two cases:
"simple" ones give very bad results
data is specific in some sense and so - in order to apply traditional kernels one has to degenerate it. For example if your data is in a graph format, you cannot apply RBF kernel, as graph is not a constant-size vector, thus you need a graph kernel to work with this object without some kind of information-loosing projection. also sometimes you have an insight into data, you know about some underlying structure, which might help classifier. One such example is a periodicity, you know that there is a kind of recuring effect in your data - then it might be worth looking for a specific kernel etc.
3) Can custom kernel substantially improve prediction quality of SVM?
Yes, in particular there always exists a (hypothethical) Bayesian optimal kernel, defined as:
K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y)
in other words, if one has a true probability P(l|x) of label l being assigned to a point x, then we can create a kernel, which pretty much maps your data points onto one-hot encodings of their most probable labels, thus leading to Bayes optimal classification (as it will obtain Bayes risk).
In practise it is of course impossible to get such kernel, as it means that you already solved your problem. However, it shows that there is a notion of "optimal kernel", and obviously none of the classical ones is not of this type (unless your data comes from veeeery simple distributions). Furthermore, each kernel is a kind of prior over decision functions - closer you get to the actual one with your induced family of functions - the more probable is to get a reasonable classifier with SVM.

how to efficiently do large matrix multiplications on Google cloud data flow?

We need to multiply a large matrix with a one-dimensional vector. The large matrix is sparse. In a second scenario, we need to multiply two large matrices, both of which are sparse. And in the third scenario, we need to multiply two large matrices both of which are dense.
Apache Spark seems to provide a built-in data type for matrices (including a specialized one for sparse matrices) as well as what seems to be a very rich set of libraries for matrix linear algebra (multiplication, addition, transposition, etc.)
How can one efficiently do the matrix multiplications (or other linear algebra operations for matrixes) on Google Cloud DataFlow for the three scenarios described above?
Dataflow currently doesn't support matrix operations natively. That said, it should be possible to implement these operations similarly to spark.
For sparse matrices, it should be possible to key by the (x,y) coordinate, and then do a GroupByKey.
For dense matrices, you can divide the matrix into blocks, use a GroupByKey to group the blocks, and then use a native library (such as BLAS) to implement the multiplication on the blocks.
See BlockMatrix for more information on how the block operations are implemented in Spark.
The following method should work on dataflow.

Doing SparseMat (sparse matrix) operations in openCV

I need to do matrix operations (mainly multiply and inverse) of a sparse matrix SparseMat in OpenCV.
I noticed that you can only iterate and insert values to SparseMat.
Is there an external code I can use? (or am I missing something?)
It's just that sparse matrices are not really suited for inversion or matrix-matrix-multiplication, so it's quite reasonable there is no builtin function for that. They're actually more used for matrix-vector multiplication (usually when solving iterative linear systems).
What you can do is solve N linear systems (with the columns of the identity matrix as right hand sides) to get the inverse matrix. But then you need N*N storage for the inverse matrix anyway, so using a dense matrix with a usual decompositions algorithm would be a better way to do it, as the performance gain won't be that high when doing N iterative solutions. Or maybe some sparse direct solvers like SuperLU or TAUCS may help, but I doubt that OpenCV has such functionalities.
You should also think if you really need the inverse matrix. Often such problem are also solvable by just solving a linear system, which can be done with a sparse matrix quite easily and fast via e.g. CG or BiCGStab.
You can convert a SparseMat to a Mat, do what operations you need and then convert back.
you can use Eigen library directly. Eigen works together with OpenCV very well.

Resources