Strict Positive Definite Matrix in cvxpy python - cvxpy

I am trying to apply cvxpy python to solve an LMI
How can I define a strictly positive definite matrix here?
For about a matrix of order 10 for example this is the syntax
X = cp.Variable((100, 100), PSD=True)
but it is for X when X>=0
I need X>0.
I have read the cvxpy and search in it but there was not such an item.
Thanks in advance.

Strict inequalities are not possible because what does that mean when computations are done in finite precision floating numbers.
But you can do
(X - I*eps) is PSD
where I is the identity matrix and eps is small positive number say 1.0e-6.
It should not be too small.

Related

Eigenvalues of symmetric band matrix using Accelerate framework

In a macOS/iOS code base, I've got a real symmetric band matrix that can be anywhere from 10 × 10 to about 500 × 500, and I need to compute whether all its eigenvalues are greater than (or equal to) a certain threshold. So I only strictly need to know the lowest eigenvalue, in case that helps.
Is there any function or set of functions in Apple's Accelerate framework that can provide a full or partial solution to this? Ideally with a cost proportional to the number of non-zero entries.
Based on this, it appears there's a set of LAPACK functions that compute eigenvalues efficiently for banded symmetric matrices. (LAPACK is implemented a part of the Accelerate framework.)
As I understand it, ssbtrd followed by ssterf should do the trick.
SSBTRD reduces a real symmetric band matrix A to symmetric
tridiagonal form T by an orthogonal similarity transformation:
Q**T * A * Q = T.
SSTERF computes all eigenvalues of a symmetric tridiagonal matrix
using the Pal-Walker-Kahan variant of the QL or QR algorithm.

Write Dirichlet Log Likelihood with DCP ruleset

I would like to write the log likelihood of the Dirichlet density as a disciplined convex programming (DCP) optimization problem with respect to the parameters of the Dirichlet distribution alpha. However, the log likelihood
def dirichlet_log_likelihood(p, alpha):
"""Log of Dirichlet density.
p: Numpy array of shape (K,) that sums to 1.
alpha: Numpy array of shape (K, ) with positive elements.
"""
L = np.log(scipy.special.gamma(alpha.sum()))
L -= np.log(scipy.special.gamma(alpha)).sum()
L += np.sum((alpha - 1) * np.log(p))
return L
despite being concave in alpha is not formulated as DCP because it involves the difference of two concave functions np.log(gamma(alpha.sum())) and np.log(gamma(alpha)).sum(). I would like if possible, to formulate this function of alpha so that it follows the DCP ruleset, so that maximum-likelihood estimation of alpha can be performed with cvxpy.
Is this possible, and if so how might I do it?
As you note, np.log(gamma(alpha.sum())) and -np.log(gamma(alpha)).sum() have different curvature, so you need to combine them as
np.log(gamma(alpha.sum()) / gamma(alpha).sum())
to have any chance of modelling them under the DCP ruleset. The combined expression above can be recognized as the logarithm of the multivariate beta function, and since the multivariate beta function can be written as a product of bivariate beta functions (see here), you can expand the log-product to a sum-log expression where each term is of the form
np.log(beta(x,y))
and this is the convex atom you need in your DCP ruleset. What remains of you, to use it in practice, is to feed in an approximation of this atom into cvxpy. The np.log(gamma(x)) approximation here will serve as a good starting point.
Please see math.stackexchange.com for more details.

Math Behind Linear Regression

Am trying to understand math behind Linear Regression and i have verified in multiple sites that Linear Regression works under OLS method with y=mx+c to get best fit line
So in order to calculate intercept and slope we use below formula(if am not wrong)
m = sum of[ (x-mean(x)) (y-mean(y)) ] / sum of[(x-mean(x))]
c = mean(y) - b1(mean(x))
So with this we get x and c values to substitute in above equation to get y predicted values and can predict for newer x values.
But my doubt is when does "Gradient Descent" used. I understood it is also used for calculating co-efficients only in such a way it reduces the cost function by finding local minima value.
Please help me in this
Are this two having separate functions in python/R.
Or linear regression by default works on Gradient Descent(if so then when does above formula used for calculating m and c values)

SVM - Can I normalize W vector?

In SVM, there is something wrong with normalize W vector such:
for each i W_i = W_i / norm(W)
I confused. At first sight it seems that the result sign(<W, x>) will be same. But if so, in the loss function norm(W)^2 + C*Sum(hinge_loss) we can minimize W just by do W = W / (large number).
So, where am I wrong?
I suggest you to read either my minimal 5 ideas of SVMs or better
[Bur98] C. J. Burges, “A tutorial on support vector machines for pattern recognition”, Data mining and knowledge discovery, vol. 2, no. 2, pp. 121–167, 1998.
To answer your question: SVMs define a hyperplane to seperate data. Hyperplanes are defined by a normal vector w and a bias b:
If you change only w, this will give another hyperplane. However, SVMs do more tricks (see my 5 ideas) and the weight vector actually is normalized to be in a relationship to the margin between the two classes.
I think you are missing out on the constraint that
r(wTx+w0)>=1 for all examples, thus normalizing the weight vector will violate this constraint.
In fact this constraint is introduced in the SVM in the first place to actually achieve a unique solution like else you mentioned there are infinite solutions possible just by scaling the weight vector.

How to find an eigenvector given eigenvalue 1, minimising memory use

I'd be grateful if people could help me find an efficient way (probably low memory algorithm) to tackle the following problem.
I need to find the stationary distribution x of a transition matrix P. The transition matrix is an extremely large, extremely sparse matrix, constructed such that all the columns sum to 1. Since the stationary distribution is given by the equation Px = x, then x is simply the eigenvector of P associated with eigenvalue 1.
I'm currently using GNU Octave to both generate the transition matrix, find the stationary distribution, and plot the results. I'm using the function eigs(), which calculates both eigenvalues and eigenvectors, and it is possible to return just one eigenvector, where the eigenvalue is 1 (I actually had to specify 1.1, to prevent an error). Construction of the transition matrix (using a sparse matrix) is fairly quick, but finding the eigenvector gets increasingly slow as I increase the size, and I'm running out of memory before I can examine even moderately sized problems.
My current code is
[v l] = eigs(P, 1, 1.01);
x = v / sum(v);
Given that I know that 1 is the eigenvalue, I'm wondering if there is either a better method to calculate the eigenvector, or a way that makes more efficient use of memory, given that I don't really need an intermediate large dense matrix. I naively tried
n = size(P,1); % number of states
Q = P - speye(n,n);
x = Q\zeros(n,1); % solve (P-I)x = 0
which fails, since Q is singular (by definition).
I would be very grateful if anyone has any ideas on how I should approach this, as it's a calculation I have to perform a great number of times, and I'd like to try it on larger and more complex models if possible.
As background to this problem, I'm solving for the equilibrium distribution of the number of infectives in a cattle herd in a stochastic SIR model. Unfortunately the transition matrix is very large for even moderately sized herds. For example: in an SIR model with an average of 20 individuals (95% of the time the population is between 12 and 28 individuals), P is 21169 by 21169 with 20340 non-zero values (i.e. 0.0005% dense), and uses up 321 Kb (a full matrix of that size would be 3.3 Gb), while for around 50 individuals P uses 3 Mb. x itself should be pretty small. I suspect that eigs() has a dense matrix somewhere, which is causing me to run out of memory, so I should be okay if I can avoid using full matrices.
Power iteration is a standard way to find the dominant eigenvalue of a matrix. You pick a random vector v, then hit it with P repeatedly until you stop seeing it change very much. You want to periodically divide v by sqrt(v^T v) to normalise it.
The rate of convergence here is proportional to the separation between the largest eigenvalue and the second largest eigenvalue. Each iteration takes just a couple of matrix multiplies.
There are fancier-pants ways to do this ("PageRank" is one good thing to search for here) that improve speed for really huge sparse matrices, but I don't know that they're necessary or useful here.
Your approach seems like a good one. However, what you're calling x, is the null space of Q. null(Q) would work if it supported sparse matrices, but it doesn't. There's a bunch of stuff on the web for finding the null space of a sparse matrix. For example:
http://www.mathworks.co.uk/matlabcentral/newsreader/view_thread/249467
http://www.mathworks.com/matlabcentral/fileexchange/42922-null-space-for-sparse-matrix/content/nulls.m
http://www.mathworks.com/matlabcentral/fileexchange/11120-null-space-of-a-sparse-matrix
It seems the best solution is to use the Power Iteration method, as suggested by tmyklebu.
The method is to iterate x = Px; x /= sum(x), until x converges. I'm assuming convergence if the d1 norm between successive iterations is less than 1e-5, as that seems to give good results.
Convergence can take a while, since the largest two eigenvalues are fairly close (the number of iterations needed to converge can vary considerably, from around 200 to 2000 depending on the model used and population sizes, but it gets there in the end). However, the memory requirements are low, and it's very easy to implement.

Resources