I'm writing a function that will identify when a dropped object will hit the ground. It starts at some highness-value y with some initial velocity y0. It takes gravity acceleration (9.81m/s) into account and tries to determine a dt at which y == 0.
The code (below) works fine to determine at what point the mass will hit the ground. However, I had to find out the hard way that I have to use Solver and not Optimize. (result: unknown). Can somebody explain the reason for this?
Shouldn't optimize also be able to find a solution? (obviously there is only one in this scenario)
Here's my code:
from z3 import *
vy0, y0 = Reals("vy0 y0") # initial velocity, initial position
dt, vy, y = Reals("dt vy y") # time(s), acceleration, velocity, position
solver = Solver() # Optmize doesn't work, but why?
solver.add(vy0 == 0)
solver.add(y0 == 3)
solver.add(dt >= 0) # time needs to be positive
solver.add(vy == vy0 - 9.81 * dt) # the velocity is initial - acceleration * time
solver.add(y == y0 + vy/2 * dt) # the position changes with the velocity (s = v/2 * t)
solver.add(y == 0)
# solver.minimize(dt) # Optmize doesn't work, but why?
print(solver.check())
print(solver.model())
Z3's optimizer only works with linear constraints. You have a non-linear term (due to the multiplication involving vy and dt), and thus the optimizing solver gives up with Unknown.
The satisfiability solver, however, can deal with non-linear real constraints; and hence has no problem giving you a model.
For more on non-linear optimization in Z3, simply search for that term. You'll see many folks asking similar questions! Here's one example: z3Opt optimize non-linear function using qfnra-nlsat
(Note that nonlinear optimization is a significantly harder problem for reals as opposed to pure satisfiability. So, it's not just a matter of "not having implemented it yet.")
Related
I am trying to formulate a trajectory optimization problem for a glider, where I want to maximize the average horisontal velocity. I have formulated the system as a drakesystem, and the state vector consists of the position and velocity.
Currently, I have something like the following:
dircol = DirectCollocation(
plant,
context,
num_time_samples=N,
minimum_timestep=min_dt,
maximum_timestep=max_dt,
)
... # other constraints etc
horisontal_pos = dircol.state()[0:2] # Only (x,y)
time = dircol.time()
dircol.AddFinalCost(-w.T.dot(horisontal_pos) / time)
where AddFinalCost() should replace all instances of state() and time() with the final values, as far as I understand from the documentation. min_dt is non-zero and w is a vector of linear weights.
However, I am getting the following error message
Expression (...) is not a polynomial. ParseCost does not support non-polynomial expression.
which makes me think that there is no way of adding the type of cost function that I am looking for. Is there anything that I am missing?
Thank you in advance!
When calling AddFinalCost(e) with e being a symbolic expression, we can only handle it when e is a polynomial function of the state (more precisely, either a quadratic function or a linear function). Hence the error you see complaining that the cost is not polynomial.
You could add the cost like this
def average_speed(v):
x = v[0]
time_steps = v[1:]
return v[0] / np.sum(time_steps)
h_vars = [dircol.timestep[i] for i in range(N-1)]
dircol.AddCost(average_speed, vars=[dircol.state(N-1)[0]] + h_vars)
which uses a function average_speed to evaluate the average speed. You could find example of doing this in https://github.com/RobotLocomotion/drake/blob/e5f3c3e5f7927ef675066d97d3afac55d3481305/bindings/pydrake/solvers/test/mathematicalprogram_test.py#L590
First, the cost function should be a scalar, but you a vector-valued horisontal_pos / time, which has two entries containing both position_x / dt and position_y / dt, namely a vector as the cost. You should instead provide a scalar valued cost.
Second, it is unclear to me why you divide time in the final cost. As far as I understand it, you want the final position to be close to the origin, so something like position_x² + position_y². The code can look like
dircol.AddFinalCost(horisontal_pos[0]**2 + horisontal_pos[1]**2)
I have implemented a very simple linear regression with gradient descent algorithm in JavaScript, but after consulting multiple sources and trying several things, I cannot get it to converge.
The data is absolutely linear, it's just the numbers 0 to 30 as inputs with x*3 as their correct outputs to learn.
This is the logic behind the gradient descent:
train(input, output) {
const predictedOutput = this.predict(input);
const delta = output - predictedOutput;
this.m += this.learningRate * delta * input;
this.b += this.learningRate * delta;
}
predict(x) {
return x * this.m + this.b;
}
I took the formulas from different places, including:
Exercises from Udacity's Deep Learning Foundations Nanodegree
Andrew Ng's course on Gradient Descent for Linear Regression (also here)
Stanford's CS229 Lecture Notes
this other PDF slides I found from Carnegie Mellon
I have already tried:
normalizing input and output values to the [-1, 1] range
normalizing input and output values to the [0, 1] range
normalizing input and output values to have mean = 0 and stddev = 1
reducing the learning rate (1e-7 is as low as I went)
having a linear data set with no bias at all (y = x * 3)
having a linear data set with non-zero bias (y = x * 3 + 2)
initializing the weights with random non-zero values between -1 and 1
Still, the weights (this.b and this.m) do not approach any of the data values, and they diverge into infinity.
I'm obviously doing something wrong, but I cannot figure out what it is.
Update: Here's a little bit more context that may help figure out what my problem is exactly:
I'm trying to model a simple approximation to a linear function, with online learning by a linear regression pseudo-neuron. With that, my parameters are:
weights: [this.m, this.b]
inputs: [x, 1]
activation function: identity function z(x) = x
As such, my net will be expressed by y = this.m * x + this.b * 1, simulating the data-driven function that I want to approximate (y = 3 * x).
What I want is for my network to "learn" the parameters this.m = 3 and this.b = 0, but it seems I get stuck at a local minima.
My error function is the mean-squared error:
error(allInputs, allOutputs) {
let error = 0;
for (let i = 0; i < allInputs.length; i++) {
const x = allInputs[i];
const y = allOutputs[i];
const predictedOutput = this.predict(x);
const delta = y - predictedOutput;
error += delta * delta;
}
return error / allInputs.length;
}
My logic for updating my weights will be (according to the sources I've checked so far) wi -= alpha * dError/dwi
For the sake of simplicity, I'll call my weights this.m and this.b, so we can relate it back to my JavaScript code. I'll also call y^ the predicted value.
From here:
error = y - y^
= y - this.m * x + this.b
dError/dm = -x
dError/db = 1
And so, applying that to the weight correction logic:
this.m += alpha * x
this.b -= alpha * 1
But this doesn't seem correct at all.
I finally found what's wrong, and I'm answering my own question in hopes it will help beginners in this area too.
First, as Sascha said, I had some theoretical misunderstandings. It may be correct that your adjustment includes the input value verbatim, but as he said, it should already be part of the gradient. This all depends on your choice of the error function.
Your error function will be the measure of what you use to measure how off you were from the real value, and that measurement needs to be consistent. I was using mean-squared-error as a measurement tool (as you can see in my error method), but I was using a pure-absolute error (y^ - y) inside of the training method to measure the error. Your gradient will depend on the choice of this error function. So choose only one and stick with it.
Second, simplify your assumptions in order to test what's wrong. In this case, I had a very good idea what the function to approximate was (y = x * 3) so I manually set the weights (this.b and this.m) to the right values and I still saw the error diverge. This means that weight initialization was not the problem in this case.
After searching some more, my error was somewhere else: the function that was feeding data into the network was mistakenly passing a 3 hardcoded value into the predicted output (it was using a wrong index in an array), so the oscillation I saw was because of the network trying to approximate to y = 0 * x + 3 (this.b = 3 and this.m = 0), but because of the small learning rate and the error in the error function derivative, this.b wasn't going to get near to the right value, making this.m making wild jumps to adjust to it.
Finally, keep track of the error measurement as your network trains, so you can have some insight into what's going on. This helps a lot to identify a difference between simple overfitting, big learning rates and plain simple mistakes.
Chapter 4.5.2 of Elements of Statistical Learning
I don't understand what does it mean:
"Since for any β and β0 satisfying these inequalities, any positively scaled
multiple satisfies them too, we can arbitrarily set ||β|| = 1/M."
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
"Since for any β and β0 satisfying these inequalities, any positively scaled multiple satisfies them too, we can arbitrarily set ||β|| = 1/M."
y_i(x_i' b + b0) >= M ||b||
thus for any c>0
y_i(x_i' [bc] + [b0c]) >= M ||bc||
thus you can always find such c that ||bc|| = 1/M, so we can focus only on b such that they have such norm (we simply limit the space of possible solutions because we know that scaling does not change much)
Also, how does maximize M becomes minimize 1/2(||β||^2) ?
We put ||b|| = 1/M, thus M=1/||b||
max_b M = max_b 1 / ||b||
now maximization of positive f(b) is equivalent of minimization of 1/f(b), so
min ||b||
and since ||b|| is positive, its minimization is equivalent to minimization of the square, as well as multiplied by 1/2 (this does not change the optimal b)
min 1/2 ||b||^2
I'm trying to minimize my function "FunctionToMinimize", which is defined as follows:
FunctionToMinimize[a_, b_, c_, d_] := (2.35*Sqrt[
Variance[1/2*
(a*#1 + b*#2 + c*#3 + d*#4)
]
]
/Mean[1/2*(a*#1 + b*#2 + c*#3 + d*#4)])
&[DataList1[[1 ;; 1000]],DataList2[[1 ;; 1000]],
DataList3[[1 ;; 1000]], DataList4[[1 ;; 1000]]]
The four parameters a,b,c and d are restricted to be somewhere between 0.5 and 1.5. My Problem is now, that if I call
NMinimize[{Funktion[w, x, y, z],
0.75 < w < 1.25 && 0.75 < y < 1.25 && 0.75 < x < 1.25 && 0.75 < z < 1.25},
{w, x, y, z}]
the Mathematica kernel shuts down because it has not enough memory. If I use only the first 100 entries in my DataLists, it will find me results (in 4.1 sec), but if I use DataList[[1;;1000]] or even more entries, the kernel crashes.
Has anybody an idea, why the NMinimize function uses so much memory? I would need to have the minimization for 150'000 events in each list...
Thanks for your answer,
Cheers,
Andreas
I would guess (but haven't in any way checked) that the problem is that on each call to your function, Mathematica is trying to construct a symbolic expression derived from all your data and that occupies much more memory than you'd expect.
Regardless, the good news -- if you haven't long since moved on and forgotten about this problem -- is that you can turn the function into something much simpler.
So, first of all, the 2.35 and the 1/2s just change your function by a constant factor and don't affect where the minimum is, so let's ignore them. Next, your function is always non-negative, so minimizing it is the same as minimizing its square, so let's do that.
So now you're trying to minimize var(aw+bx+cy+dz)/mean(aw+bx+cy+dz)^2 where w,x,y,z are (perhaps quite long) vectors.
Now your numerator and denominator are both just quadratic forms in a,b,c,d whose coefficients depend (in fixed ways) on those vectors. Specifically, suppose your vectors have length N. Then your function is just
[sum(aw+bx+cy+dz)^2/N - sum(aw+bx+cy+dz)^2/N^2] / (sum(aw+bx+cy+dz)^2/N^2)
which you might prefer to write as N sum(aw+bx+cy+dz)^2 / sum(aw+bx+cy+dz)^2 - 1
and in that fraction, e.g., the coefficient of bc in the numerator is 2 sum(xy), and the coefficient in the denominator is 2 sum(x) sum(y).
So you can take your big vectors, compute the relevant coefficients once, and then just ask Mathematica to optimize a function of the form (quadratic / quadratic), which should be pretty painless.
Let's assume a very simple constraint: solve(x > 0 && x < 5).
Can Z3 (or any other SMT solver, or any other automatic technique)
compute the minimum and maximum values of (integer) variable x that satisfies the given constraints?
In our case, the minimum is 1 and the maximum is 4.
Z3 has not support for optimizing (maximizing/minimizing) objective functions or variables.
We plan to add this kind of capability, but it will not happen this year.
In the current version, we can "optimize" an objective function by solving several problems where in each iteration we add additional constraints. We know we found the optimal when the problem becomes unsatisfiable. Here is a small Python script that illustrates the idea. The script maximizes the value of a variable X. For minimization, we just have to replace s.add(X > last_model[X]) with s.add(X < last_model[X]). This script is very naive, it performs a "linear search". It can be improved in many ways, but it demonstrates the basic idea.
You can also try the script online at: http://rise4fun.com/Z3Py/KI1
See the following related question: Determine upper/lower bound for variables in an arbitrary propositional formula
from z3 import *
# Given formula F, find the model the maximizes the value of X
# using at-most M iterations.
def max(F, X, M):
s = Solver()
s.add(F)
last_model = None
i = 0
while True:
r = s.check()
if r == unsat:
if last_model != None:
return last_model
else:
return unsat
if r == unknown:
raise Z3Exception("failed")
last_model = s.model()
s.add(X > last_model[X])
i = i + 1
if (i > M):
raise Z3Exception("maximum not found, maximum number of iterations was reached")
x, y = Ints('x y')
F = [x > 0, x < 10, x == 2*y]
print max(F, x, 10000)
As Leonardo pointed out, this was discussed in detail before: Determine upper/lower bound for variables in an arbitrary propositional formula. Also see: How to optimize a piece of code in Z3? (PI_NON_NESTED_ARITH_WEIGHT related).
To summarize, one can either use a quantified formula, or go iteratively. Unfortunately, these techniques are not equivalent:
Quantified approach needs no iteration, and can find global min/max in a single call to the solver; at least in theory. However, it does give rise to harder formulas. So, the backend solver can time-out, or simply return "unknown".
Iterative approach creates simple formulas for the backend solver to deal with, but it can loop forever if there's no optimal value; simplest example being trying to find the largest Int value. Quantified version can solve this problem nicely by quickly telling you that there is no such value, while the iterative version would go on indefinitely. This can be a problem if you don't know ahead of time that your constraints do have an optimal solution. (Needless to say, the "sufficient" iteration count is typically hard to guess, and might depend on random factors, like the seed used by the solver.)
Also keep in mind that if there is a custom optimization algorithm for the problem domain at hand, it's unlikely that a general purpose SMT solver can outperform it.
z3 now supports optimization.
from z3 import *
o = Optimize()
x = Int( 'x' )
o.add(And(x > 0, x < 5))
o.maximize(x)
print(o.check()) # prints sat
print(o.model()) # prints [x = 4]
This particular problem is an integer program.