Provide custom gradient to drake::MathematicalProgram - drake

Drake has an interface where you can give it a generic function as a constraint and it can set up the nonlinearly-constrained mathematical program automatically (as long as it supports AutoDiff). I have a situation where my constraint does not support AutoDiff (the constraint function conducts a line search to approximate the maximum value of some function), but I have a closed-form expression for the gradient of the constraint. In my case, the math works out so that it's difficult to find a point on this function, but once you have that point it's easy to linearize around it.
I know many optimization libraries will allow you to provide your own analytical gradient when available; can you do this with Drake's MathematicalProgram as well? I could not find mention of it in the MathematicalProgram class documentation.
Any help is appreciated!

It's definitely possible, but I admit we haven't provided helper functions that make it pretty yet. Please let me know if/how this helps; I will plan to tidy it up and add it as an example or code snippet that we can reference in drake.
Consider the following code:
from pydrake.all import AutoDiffXd, MathematicalProgram, Solve
prog = MathematicalProgram()
x = prog.NewContinuousVariables(1, 'x')
def cost(x):
return (x[0]-1.)*(x[0]-1.)
def constraint(x):
if isinstance(x[0], AutoDiffXd):
print(x[0].value())
print(x[0].derivatives())
return x
cost_binding = prog.AddCost(cost, vars=x)
constraint_binding = prog.AddConstraint(
constraint, lb=[0.], ub=[2.], vars=x)
result = Solve(prog)
When we register the cost or constraint with MathematicalProgram in this way, we are allowing that it can get called with either x being a float, or x being an AutoDiffXd -- which is simply a wrapping of Eigen's AutoDiffScalar (with dynamically allocated derivatives of type double). The snippet above shows you roughly how it works -- every scalar value has a vector of (partial) derivatives associated with it. On entry to the function, you are passed x with the derivatives of x set to dx/dx (which will be 1 or zero).
Your job is to return a value, call it y, with the value set to the value of your cost/constraint, and the derivatives set to dy/dx. Normally, all of this happens magically for you. But it sounds like you get to do it yourself.
Here's a very simple code snippet that, I hope, gets you started:
from pydrake.all import AutoDiffXd, MathematicalProgram, Solve
prog = MathematicalProgram()
x = prog.NewContinuousVariables(1, 'x')
def cost(x):
return (x[0]-1.)*(x[0]-1.)
def constraint(x):
if isinstance(x[0], AutoDiffXd):
y = AutoDiffXd(2*x[0].value(), 2*x[0].derivatives())
return [y]
return 2*x
cost_binding = prog.AddCost(cost, vars=x)
constraint_binding = prog.AddConstraint(
constraint, lb=[0.], ub=[2.], vars=x)
result = Solve(prog)
Let me know?

Related

Evaluate a symbolic expression inside a MathematicalProgram constraint

I want to use a symbolic expression as a MathematicalProgram constraint but am unsure how to achieve this. My best go so far is the following (simplified example):
x = Variable("x")
expression = x**2
prog = MathematicalProgram()
v = prog.NewContinuousVariables(1)
prog.AddConstraint(
lambda a: Evaluate(np.array([expression]), {x: a[0].value()}),
lb=np.array([0.0]),
ub=np.array([0.0]),
vars=v,
)
result = Solve(prog)
I'm getting the error PyFunctionConstraint: Output must be of scalar type AutoDiffXd. Got float instead.. Using lambda a: Evaluate(np.array([expression]), {x: a[0]}) does not work due to incompatible function arguments.
I'd highly appreciate any help with this.
I don't think we currently support adding symbolic::Expression as constraint in pydrake yet. On the other hand, we do support ExpressionConstraint in C++ version of Drake.
May I ask why you would like to impose the constraint using symbolic Expression? It is generally much faster to evaluate the constraint, if pass a function directly, something like this
def foo(x):
return np.array([x[0] **2])
prog.AddConstraint(foo, np.array([0.]), np.array([0.]), vars=v)
#Hongkai Dai's answer with the ExpressionConstraint in C++ led me in the right direction. There is such a constraint in pydrake (see here). However, it currently does not support array inputs. The second required insight was that it is possible to use prog.NewContinuousVariables in symbolic expression operations (e.g. Jacobian).
Using these insights, I solved my problem with something similar to the following:
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2)
expression = x[0]**2
J = expression.Jacobian([x[0]])
for i in range(2):
prog.AddConstraint(J[i], 0.0, 0.0)
result = Solve(prog)

Constraints for Direct Collocation method in pydrake

I am looking for a way to describe the constraints of the Direct Collocation method in pydrake.
I got the robot model from my own URDF by using FindResource as this(l11-16).
Then, I tried to make some functions which calculate the positions of the joints as swing_foot_height(q) of this.
However there is a problem.
It is maybe a type error.
I defined q as following
robot = MultibodyPlant(time_step=0.0)
scene_graph = SceneGraph()
robot.RegisterAsSourceForSceneGraph(scene_graph)
file_name = FindResource("models/robot.urdf")
Parser(robot).AddModelFromFile(file_name)
robot.Finalize()
context = robot.CreateDefaultContext()
dircol = DirectCollocation(
robot,
context,
...(Omission)...
input_port_index=robot.get_actuation_input_port().get_index())
x = dircol.state()
nq = biped_robot.num_positions()
q = x[0:nq]
Then, I used this q for the function like swing_foot_height(q).
The error is like
SetPositions(): incompatible function arguments. The following argument types are supported:
...
q: numpy.ndarray[numpy.float64[m, 1]]
...
Invoked with:
...
array([Variable('x(0)', Continuous), ... Variable('x(9)', Continuous)],dtype=object)
Are there some way to avoid this error?
Right. In the compass gait notebook that you cited, there was an important line:
# overwrite MultibodyPlant with its autodiff copy
compass_gait = compass_gait.ToAutoDiffXd()
so that multibody plant that was being used in the constraint is actually an AutoDiffXd version of the plant.
The littledog notebook has more examples of this, with a more robust implementation that works for both float and autodiff constraint evaluations.
As far as I understand this, trajectory optimization with DirectCollocation converts the data type of the decision variables (in your case, x and q) to AutoDiffXd type. That is the type you're seeing here in the "Invoked with" error message. This is the type used for automatic differentiation which is used for finding the gradients for the optimization solver.
You'll need to convert back to float to use the SetPositions() function.

Maxima: Is there any way to make functions defined within the main function be local, in a similar way to local variables?

I wonder if there is any way to make functions defined within the main function be local, in a similar way to local variables. For example, in this function that calculates the gradient of a scalar function,
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
The variable gradient that has been defined inside the main function grad(var,f) has no effect outside the main function, as it is inside the aux list. However, I have observed that the function DfDx, despite being inside the aux list, does have an effect outside the main function.
Is there any way to make the sub-functions defined inside the main function to be local only, in a similar way to what can be made with local variables? (I know that one can kill them once they have been used, but perhaps there is a more elegant way)
To address the problem you are needing to solve here, another way to compute the gradient is to say
grad(var, e) := makelist(diff(e, var1), var1, var);
and then you can say for example
grad([x, y, z], sin(x)*y/z);
to get
cos(x) y sin(x) sin(x) y
[--------, ------, - --------]
z z 2
z
(There isn't a built-in gradient function; this is an oversight.)
About local functions, bear in mind that all function definitions are global. However you can approximate a local function definition via local, which saves and restores all properties of a symbol. Since the function definition is a property, local has the effect of temporarily wiping out an existing function definition and later restoring it. In between you can create a temporary function definition. E.g.
foo(x) := 2*x;
bar(y) := block(local(foo), foo(x) := x - 1, foo(y));
bar(100); /* output is 99 */
foo(100); /* output is 200 */
However, I don't this you need to use local -- just makelist plus diff is enough to compute the gradient.
There is more to say about Maxima's scope rules, named and unnamed functions, etc. I'll try to come back to this question tomorrow.
To compute the gradient, my advice is to call makelist and diff as shown in my first answer. Let me take this opportunity to address some related topics.
I'll paste the definition of grad shown in the problem statement and use that to make some comments.
grad(var,f) := block([aux],
aux : [gradient, DfDx[i]],
gradient : [],
DfDx[i] := diff(f(x_1,x_2,x_3),var[i],1),
for i in [1,2,3] do (
gradient : append(gradient, [DfDx[i]])
),
return(gradient)
)$
(1) Maxima works mostly with expressions as opposed to functions. That's not causing a problem here, I just want to make it clear. E.g. in general one has to say diff(f(x), x) when f is a function, instead of diff(f, x), likewise integrate(f(x), ...) instead of integrate(f, ...).
(2) When gradient and Dfdx are to be the local variables, you have to name them in the list of variables for block. E.g. block([gradient, Dfdx], ...) -- Maxima won't understand block([aux], aux: ...).
(3) Note that a function defined with square brackets instead of parentheses, e.g. f[x] := ... instead of f(x) := ..., is a so-called array function in Maxima. An array function is a memoizing function, i.e. if f[x] is called two or more times, the return value is only computed once, and then returned every time thereafter. Sometimes that's a useful optimization when the domain of the function comprises a finite set.
(4) Bear in mind that x_1, x_2, x_3, are distinct symbols, not related to each other, and not related to x[1], x[2], x[3], even if they are displayed the same. My advice is to work with subscripted symbols x[i] when i is a variable.
(5) About building up return values, try to arrange to compute the whole thing at one go, instead of growing the result incrementally. In this case, makelist is preferable to for plus append.
(6) The return function in Maxima acts differently than in other programming languages; it's a little hard to explain. A function returns the value of the last expression which was evaluated, so if gradient is that last expression, you can just write grad(var, f) := block(..., gradient).
Hope this helps, I know it's obscure and complex. The Maxima programming language was not designed before being implemented, and some of the decisions are clearly questionable at the long interval of more than 50 years (!) later. That's okay, they were figuring it out as they went along. There was not a body of established results which could provide a point of reference; the original authors were contributing to what's considered common knowledge today.

Floor and Ceiling Function implementation in Z3

I have tried to implement Floor and Ceiling Function as defined in the following link
https://math.stackexchange.com/questions/3619044/floor-or-ceiling-function-encoding-in-first-order-logic/3619320#3619320
But Z3 query returning counterexample.
Floor Function
_X=Real('_X')
_Y=Int('_Y')
_W=Int('_W')
_n=Int('_n')
_Floor=Function('_Floor',RealSort(),IntSort())
..
_s.add(_X>=0)
_s.add(_Y>=0)
_s.add(Implies(_Floor(_X)==_Y,And(Or(_Y==_X,_Y<_X),ForAll(_W,Implies(And(_W>=0,_W<_X),And(_W ==_Y,_W<_Y))))))
_s.add(Implies(And(Or(_Y==_X,_Y<_X),ForAll(_W,Implies(And(_W>=0,_W<_X),And(_W==_Y,_W<_Y))),_Floor(_X)==_Y))
_s.add(Not(_Floor(0.5)==0))
Expected Result - Unsat
Actual Result - Sat
Ceiling Function
_X=Real('_X')
_Y=Int('_Y')
_W=Int('_W')
_Ceiling=Function('_Ceiling',RealSort(),IntSort())
..
..
_s.add(_X>=0)
_s.add(_Y>=0)
_s.add(Implies(_Ceiling(_X)==_Y,And(Or(_Y==_X,_Y<_X),ForAll(_W,Implies(And(_W>=0,_W<_X),And(_W ==_Y,_Y<_W))))))
_s.add(Implies(And(Or(_Y==_X,_Y<_X),ForAll(_W,Implies(And(_W>=0,_W<_X),And(_W==_Y,_Y<_W)))),_Ceiling(_X)==_Y))
_s.add(Not(_Ceilng(0.5)==1))
Expected Result - Unsat
Actual Result - Sat
[Your encoding doesn't load to z3, it gives a syntax error even after eliminating the '..', as your call to Implies needs an extra argument. But I'll ignore all that.]
The short answer is, you can't really do this sort of thing in an SMT-Solver. If you could, then you can solve arbitrary Diophantine equations. Simply cast it in terms of Reals, solve it (there is a decision procedure for Reals), and then add the extra constraint that the result is an integer by saying Floor(solution) = solution. So, by this argument, you can see that modeling such functions will be beyond the capabilities of an SMT solver.
See this answer for details: Get fractional part of real in QF_UFNRA
Having said that, this does not mean you cannot code this up in Z3. It just means that it will be more or less useless. Here's how I would go about it:
from z3 import *
s = Solver()
Floor = Function('Floor',RealSort(),IntSort())
r = Real('R')
f = Int('f')
s.add(ForAll([r, f], Implies(And(f <= r, r < f+1), Floor(r) == f)))
Now, if I do this:
s.add(Not(Floor(0.5) == 0))
print(s.check())
you'll get unsat, which is correct. If you do this instead:
s.add(Not(Floor(0.5) == 1))
print(s.check())
you'll see that z3 simply loops forever. To make this usefull, you'd want the following to work as well:
test = Real('test')
s.add(test == 2.4)
result = Int('result')
s.add(Floor(test) == result)
print(s.check())
but again, you'll see that z3 simply loops forever.
So, bottom line: Yes, you can model such constructs, and z3 will correctly answer the simplest of queries. But with anything interesting, it'll simply loop forever. (Essentially whenever you'd expect sat and most of the unsat scenarios unless they can be constant-folded away, I'd expect z3 to simply loop.) And there's a very good reason for that, as I mentioned: Such theories are just not decidable and fall well out of the range of what an SMT solver can do.
If you are interested in modeling such functions, your best bet is to use a more traditional theorem prover, like Isabelle, Coq, ACL2, HOL, HOL-Light, amongst others. They are much more suited for working on these sorts of problems. And also, give a read to Get fractional part of real in QF_UFNRA as it goes into some of the other details of how you can go about modeling such functions using non-linear real arithmetic.

The tensor product ti() in GAM package gives incorrect results

I am surprising to notice that it is somehow difficult to obtain a correct fit of interaction function from gam().
To be more specific, I want to estimate an additive function:
y=m_1(x)+m_2(z)+m_{12}(x,z)+u,
where m_1(x)=x^2, m_2(z)=z^2,m_{12}(x,z)=xz. The following code generate this model:
test1 <- function(x,z,sx=1,sz=1) {
#--m1(x) function
m.x<-x^2
m.x<-m.x-mean(m.x)
#--m2(z) function
m.z<-z^2
m.z<-m.z-mean(m.z)
#--m12(x,z) function
m.xz<-x*z
m.xz<-m.xz-mean(m.xz)
m<-m.x+m.z+m.xz
return(list(m=m,m.x=m.x,m.z=m.z,m.xz=m.xz))
}
n <- 1000
a=0
b=2
x <- runif(n,a,b)/20
z <- runif(n,a,b)
u <- rnorm(n,0,0.5)
model<-test1(x,z)
y <- model$m + u
So I use gam() by fitting the model as
b3 <- gam(y~ ti(x) + ti(z) + ti(x,z))
vis.gam(b3);title("tensor anova")
#---extracting basis matrix
B.f3<-model.matrix.gam(b3)
#---extracting series estimator
b3.hat<-b3$coefficients
Question: when I plot the estimated function by gam()above against its true function, I end up with
par(mfrow=c(1,3))
#---m1(x)
B.x<-B.f3[,c(2:5)]
b.x.hat<-b3.hat[c(2:5)]
plot(x,B.x%*%b.x.hat)
points(x,model$m.x,col='red')
legend('topleft',c('Estimate','True'),lty=c(1,1),col=c('black','red'))
#---m2(z)
B.z<-B.f3[,c(6:9)]
b.z.hat<-b3.hat[c(6:9)]
plot(z,B.z%*%b.z.hat)
points(z,model$m.z,col='red')
legend('topleft',c('Estimate','True'),lty=c(1,1),col=c('black','red'))
#---m12(x,z)
B.xz<-B.f3[,-c(1:9)]
b.xz.hat<-b3.hat[-c(1:9)]
plot(x,B.xz%*%b.xz.hat)
points(x,model$m.xz,col='red')
legend('topleft',c('Estimate','True'),lty=c(1,1),col=c('black','red'))
However, the function estimate of m_1(x) is largely different from x^2, and the interaction function estimate m_{12}(x,z) is also largely different from xz defined in test1 above. The results are the same if I use predict(b3).
I really can't figure it out. Can anybody help me out by explaining why the results end up with this? Greatly appreciate it!
First, the problem of the above issue is not due to the package, of course. It is closely related to the identification conditions of the smooth functions. One common practice is to impose the assumptions that E(mj(.))=0 for all individual function j=1,...,d, and E(m_ij(x_i,x_j)|x_i)=E(m_ij(x_i,x_j)|x_j)=0 for i not equal to j. Those conditions require one to employ centered basis function in series estimator, which has been done already in GAM package. However, in my case above, function m(x,z)=x*z defined in test1 does not satisfy the above identification assumptions, since the integral of x*z with respect to either x or z is not zero when x and z have range from zero to two.
Furthermore, series estimator allows the individual and interaction function to be identified if one impose m(0)=0 or m(0,x_j)=m(x_i,0)=0. This can be readily achieved if we center the basis function around zero. I have tried both cases, and they work well whenever DGP satisfies the identification conditions.

Resources