Translating an optimization problem from CVX to CVXPY? - cvxpy

I am attempting to translate a semidefinite programming problem from CVX to CVXPY as described here. My attempt follows:
import cvxpy as cvx
import numpy as np
c = [0, 1]
n = len(c)
# Create optimization variables.
f = cvx.Variable((n, n), hermitian=True)
# Create constraints.
constraints = [f >> 0]
for k in range(1, n):
indices = [(i * n) + i - (n - k) for i in range(n - k, n)]
constraints += [cvx.sum(cvx.vec(f)[indices]) == c[n - k]]
# Form objective.
obj = cvx.Maximize(c[0] - cvx.trace(f))
# Form and solve problem.
prob = cvx.Problem(obj, constraints)
sol = prob.solve()
print(sol)
print(f.value)
The issue here is that when I take the coefficients of the Fourier series and translate them into the array c it fails on complex values. I think this is due to a discrepancy between the maximize function of CVX and CVXPY. I'm not sure what CVX is maximizing, since the trace of the matrix is a complex value. As pointed out below the trace is real since the matrix is Hermitian, but the code still fails. Can someone with CVXPY knowledge clear this up?

Related

Efficient pseudo-inverse for PyTorch 2D convolution

Background:
Thanks for your attention! I am learning the basic knowledge of 2D convolution, linear algebra and PyTorch. I encounter the implementation problem about the psedo-inverse of the convolution operator. Specifically, I have no idea about how to implement it in an efficient way. Please see the following problem statements for details. Any help/tip/suggestion is welcomed.
(Thanks a lot for your attention!)
The Original Problem:
I have an image feature x with shape [b,c,h,w] and a 3x3 convolutional kernel K with shape [c,c,3,3]. There is y = K * x. How to implement the corresponding pseudo-inverse on y in an efficient way?
There is [y = K * x = Ax], how to implement [x_hat = (A^+)y]?
I guess that there should be some operations using torch.fft. However, I still have no idea about how to implement it. I do not know if there exists an implementation previously.
import torch
import torch.nn.functional as F
c = 32
K = torch.randn(c, c, 3, 3)
x = torch.randn(1, c, 128, 128)
y = F.conv2d(x, K, padding=1)
print(y.shape)
# How to implement pseudo-inverse for y = K * x in an efficient way?
Some of My Efforts:
I may know that the 2D convolution is a linear operator. It is equivalent to a "matrix product" operator. We can actually write out the matrix form of the convolution and calculate its psedo-inverse. However, I think this type of operation will be inefficient. And I have no idea about how to implement it in an efficient way.
According to Wikipedia, the psedo-inverse may satisfy the property of A(A_pinv(x))=x, where A is the convolutional operator, A_pinv is its psedo-inverse, and x may be any image feature.
(Thanks again for reading such a long post!)
This takes the problem to another level.
The convolution itself is a linear operation, you can determine the matrix of the operation and solve a least square problem directly [1], or compute the pseudo-inverse as you mentioned, and then apply to different outputs and predicting a projection of the input.
I am changing your code to using padding=0
import torch
import torch.nn.functional as F
# your code
c = 32
K = torch.randn(c, c, 1, 1)
x = torch.randn(4, c, 128, 128)
y = F.conv2d(x, K, bias=torch.zeros((c,)))
Also, as you probably already suggested the convolution can be computed as ifft(fft(h)*fft(x)). However, the conv2d function is a cross-correlation, so you have to conjugate the filter leading to ifft(fft(h)*fft(x)), also you have to apply this to two axes, and you have to make sure the FFT is calcuated using the same representation (size), since the data is real, we can apply multi-dimensional real FFT. To be complete, conv2d works on multiple channels, so we have to calculate summations of convolutions. Since the FFT is linear, we can simply compute the summations on the frequency domain
using einsum.
s = y.shape[-2:]
K_f = torch.fft.rfftn(K, s)
x_f = torch.fft.rfftn(x, s)
y_f = torch.einsum('jkxy,ikxy->ijxy', K_f.conj(), x_f)
y_hat = torch.fft.irfftn(y_f, s)
Except for the borders it should be accurate (remember FFT computes a cyclic convolution).
torch.max(abs(y_hat[:,:,:-2,:-2] - y[:,:,:,:]))
Now, notice the pattern jk,ik->ij on the einsum, that means y_f[i,j] = sum(K_f[j,k] * x_f[i,k]) = x_f # K_f.T, if # is the matrix product on the first two dimensions. So to invert this operation we have to can interpret the first two dimensions as matrices. The function pinv will compute pseudo-inverses on the last two axes, so in order to use that we have to permute the axes. If we right multiply the output by the pseudo-inverse of transposed K_f we should invert this operation.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))
Nottice that I am using the full convolution, but the conv2d actually cropped the images. Let's apply that
y_hat[:,:,128-(k-1):,:] = 0
y_hat[:,:,:,128-(k-1):] = 0
Repeating the calculation you will see that the input is not accurate anymore, so you have to be careful about what you do with your convolution, but in some situations where you can get this to work it will be in fact efficient.
s = 128,128
K_f = torch.fft.rfftn(K, s)
K_f_inv = torch.linalg.pinv(K_f.T).T
y_f = torch.fft.rfftn(y_hat, s)
x_f = torch.einsum('jkxy,ikxy->ijxy', K_f_inv.conj(), y_f)
x_hat = torch.fft.irfftn(x_f, s)
print(torch.mean((x - x_hat)**2) / torch.mean((x)**2))

Is there a way to get the final system of equations sent by cvxpy to the solver?

If I understand correctly, cvxpy converts our high-level problem description to the standard canonical form before it is sent to a solver.
By the standard form I mean the form that can be used for the descent algorithms, so, for instance, it would convert all the absolute values in the objective to be a difference of two positive numbers with some new constraints, etc.
Wondering if its possible to see what the reduction looked like for a problem I specify in cvxpy?
For instance, lets say I have the following problem:
import numpy as np
import cvxpy as cp
x = cp.Variable(2)
L = np.asarray([[1,2],[2,3]])
P = L.T # L
constraints = []
constraints.append(x >= [-10, -10])
constraints.append(x <= [10, 10])
obj = cp.Minimize(cp.quad_form(x, P) - [1, 2] * x)
prob = cp.Problem(obj, constraints)
prob.solve(), prob.solver_stats.solver_name
(-0.24999999999999453, 'OSQP')
So, I would like to see the actual arguments (P, q, A, l, u) being sent to the OSQP solver https://github.com/oxfordcontrol/osqp-python/blob/master/module/interface.py#L278
Any help is greatly appreciated!
From looking at the documentation, it seems you can do this using the command get_problem_data as follows:
data, chain, inverse_data = prob.get_problem_data(prob.solver_stats.solver_name)
I have not tried it, and it says it output depends on the particular solver and the solver chain, but it may help you!

Optimise custom gaussian processes kernel in scikit using gridsearch

I'm working with Gaussian processes and when I use the scikit-learn GP modules I struggle to create and optimise custom kernels using gridsearchcv. The best way to describe this problem is using the classic Mauna Loa example where the appropriate kernel is constructed using a combination of already defined kernels such as RBF and RationalQuadratic. In that example the parameters of the custom kernel are not optimised but treated as given. What if I wanted to run a more general case where I would want to estimate those hyperparameters using cross-validation? How should I go about constructing the custom kernel and then the corresponding param_grid object for the grid search?
In a very naive way I could construct a custom kernel using something like this:
def custom_kernel(a,ls,l,alpha,nl):
kernel = a*RBF(length_scale=ls) \
+ b*RationalQuadratic(length_scale=l,alpha=alpha) \
+ WhiteKernel(noise_level=nl)
return kernel
however this function can't of course be called from gridsearchcv using e.g. GaussianProcessRegressor(kernel=custom_kernel(a,ls,l,alpha,nl)).
One possible path forward is presented in this SO question however I was wondering if there's an easier way to solve this problem than coding the kernel from scratch (along with its hyperparameters) as I'm looking to work with a combination of standard kernels and there's also the possibility that I would like to mix them up.
So this is how far I got. It answers the question but it is really really slow for the Mauna Loa example however that's probably a difficult dataset to work with:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.gaussian_process.kernels import ConstantKernel,RBF,WhiteKernel,RationalQuadratic,ExpSineSquared
import numpy as np
from sklearn.datasets import fetch_openml
# from https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_co2.html
def load_mauna_loa_atmospheric_co2():
ml_data = fetch_openml(data_id=41187)
months = []
ppmv_sums = []
counts = []
y = ml_data.data[:, 0]
m = ml_data.data[:, 1]
month_float = y + (m - 1) / 12
ppmvs = ml_data.target
for month, ppmv in zip(month_float, ppmvs):
if not months or month != months[-1]:
months.append(month)
ppmv_sums.append(ppmv)
counts.append(1)
else:
# aggregate monthly sum to produce average
ppmv_sums[-1] += ppmv
counts[-1] += 1
months = np.asarray(months).reshape(-1, 1)
avg_ppmvs = np.asarray(ppmv_sums) / counts
return months, avg_ppmvs
X, y = load_mauna_loa_atmospheric_co2()
# Kernel with parameters given in GPML book
k1 = ConstantKernel(constant_value=66.0**2) * RBF(length_scale=67.0) # long term smooth rising trend
k2 = ConstantKernel(constant_value=2.4**2) * RBF(length_scale=90.0) \
* ExpSineSquared(length_scale=1.3, periodicity=1.0) # seasonal component
# medium term irregularity
k3 = ConstantKernel(constant_value=0.66**2) \
* RationalQuadratic(length_scale=1.2, alpha=0.78)
k4 = ConstantKernel(constant_value=0.18**2) * RBF(length_scale=0.134) \
+ WhiteKernel(noise_level=0.19**2) # noise terms
kernel_gpml = k1 + k2 + k3 + k4
gp = GaussianProcessRegressor(kernel=kernel_gpml)
# print parameters
print(gp.get_params())
param_grid = {'alpha': np.logspace(-2, 4, 5),
'kernel__k1__k1__k1__k1__constant_value': np.logspace(-2, 4, 5),
'kernel__k1__k1__k1__k2__length_scale': np.logspace(-2, 2, 5),
'kernel__k2__k2__noise_level':np.logspace(-2, 1, 5)
}
grid_gp = GridSearchCV(gp,cv=5,param_grid=param_grid,n_jobs=4)
grid_gp.fit(X, y)
What helped me was to initialise the model first as gp = GaussianProcessRegressor(kernel=kernel_gpml) and then use the get_params attribute in order to get a list of the model hyper parameters.
Finally I note that in their book Rasmussen and Williams appear to have used Leave one out cross validation to tune the hyperparameters.

Kernel Function in Gaussian Processes

Given a kernel in Gaussian Process, is it possible to know the shape of functions being drawn from the prior distribution without sampling at first?
I think the best way to know the shape of prior functions is to draw them. Here's 1-dimensional example:
These are the samples from the GP prior (mean is 0 and covariance matrix induced by the squared exponential kernel). As you case see they are smooth and generally it gives a feeling how "wiggly" they are. Also note that in case of multi-dimensions each one of them will look somewhat like this.
Here's a full code I used, feel free to write your own kernel or tweak the parameters to see how it affects the samples:
import numpy as np
import matplotlib.pyplot as pl
def kernel(a, b, gamma=0.1):
""" GP squared exponential kernel """
sq_dist = np.sum(a**2, 1).reshape(-1, 1) + np.sum(b**2, 1) - 2*np.dot(a, b.T)
return np.exp(-0.5 * (1 / gamma) * sq_dist)
n = 300 # number of points.
m = 10 # number of functions to draw.
s = 1e-6 # noise variance.
X = np.linspace(-5, 5, n).reshape(-1, 1)
K = kernel(X, X)
L = np.linalg.cholesky(K + s * np.eye(n))
f_prior = np.dot(L, np.random.normal(size=(n, m)))
pl.figure(1)
pl.clf()
pl.plot(X, f_prior)
pl.title('%d samples from the GP prior' % m)
pl.axis([-5, 5, -3, 3])
pl.show()

Minimum and maximum values of integer variable

Let's assume a very simple constraint: solve(x > 0 && x < 5).
Can Z3 (or any other SMT solver, or any other automatic technique)
compute the minimum and maximum values of (integer) variable x that satisfies the given constraints?
In our case, the minimum is 1 and the maximum is 4.
Z3 has not support for optimizing (maximizing/minimizing) objective functions or variables.
We plan to add this kind of capability, but it will not happen this year.
In the current version, we can "optimize" an objective function by solving several problems where in each iteration we add additional constraints. We know we found the optimal when the problem becomes unsatisfiable. Here is a small Python script that illustrates the idea. The script maximizes the value of a variable X. For minimization, we just have to replace s.add(X > last_model[X]) with s.add(X < last_model[X]). This script is very naive, it performs a "linear search". It can be improved in many ways, but it demonstrates the basic idea.
You can also try the script online at: http://rise4fun.com/Z3Py/KI1
See the following related question: Determine upper/lower bound for variables in an arbitrary propositional formula
from z3 import *
# Given formula F, find the model the maximizes the value of X
# using at-most M iterations.
def max(F, X, M):
s = Solver()
s.add(F)
last_model = None
i = 0
while True:
r = s.check()
if r == unsat:
if last_model != None:
return last_model
else:
return unsat
if r == unknown:
raise Z3Exception("failed")
last_model = s.model()
s.add(X > last_model[X])
i = i + 1
if (i > M):
raise Z3Exception("maximum not found, maximum number of iterations was reached")
x, y = Ints('x y')
F = [x > 0, x < 10, x == 2*y]
print max(F, x, 10000)
As Leonardo pointed out, this was discussed in detail before: Determine upper/lower bound for variables in an arbitrary propositional formula. Also see: How to optimize a piece of code in Z3? (PI_NON_NESTED_ARITH_WEIGHT related).
To summarize, one can either use a quantified formula, or go iteratively. Unfortunately, these techniques are not equivalent:
Quantified approach needs no iteration, and can find global min/max in a single call to the solver; at least in theory. However, it does give rise to harder formulas. So, the backend solver can time-out, or simply return "unknown".
Iterative approach creates simple formulas for the backend solver to deal with, but it can loop forever if there's no optimal value; simplest example being trying to find the largest Int value. Quantified version can solve this problem nicely by quickly telling you that there is no such value, while the iterative version would go on indefinitely. This can be a problem if you don't know ahead of time that your constraints do have an optimal solution. (Needless to say, the "sufficient" iteration count is typically hard to guess, and might depend on random factors, like the seed used by the solver.)
Also keep in mind that if there is a custom optimization algorithm for the problem domain at hand, it's unlikely that a general purpose SMT solver can outperform it.
z3 now supports optimization.
from z3 import *
o = Optimize()
x = Int( 'x' )
o.add(And(x > 0, x < 5))
o.maximize(x)
print(o.check()) # prints sat
print(o.model()) # prints [x = 4]
This particular problem is an integer program.

Resources