When reasoning about polynomial inequalities, Z3 seems to have to first transform the polynomial into monomial form. I'm wondering if there's a setting in the solver that let me define the monomial degree I want my polynomials to be transformed to?
I'm using the z3py interface and I can't find it by searching online.
No, Z3 does not have this setting.
Related
I'm working on implementing a trajectory optimization algorithm named DIRTREL, which is essentially direct transcription with an added cost function. However, the cost function incorporates the K matrices obtained by linearizing the system around the decision variables (x, u) and employing discrete time-varying LQR. My question is how to most efficiently and concisely express this in drake as my current approach describes the system symbolically and results in extremely lengthly symbolic equations (which will only increase in length with more timesteps) due to the recursive nature of the Riccati difference equation, and if this symbolic approach is even appropriate.
For more details:
Specify my system as a LeafSystem
Declare a MathematicalProgram with decision variables x, u
To obtain time-varying linearized dynamics, specify a class that takes in the dynamics and decision variables at a single timestep and returns Jacobians for that timestep with symbolic.Jacobian(args)
Add cost function which takes in the entire trajectory, so all x, u
Inside the cost function:
Obtain linearized matrices A_i, B_i, G_i (G_i for noise) for each timestep by using the class that takes in decision variables and returns Jacobians
Compute the TVLQR cost (S[n]) with the Riccati difference equations employing the A_i and B_i matrices and solving for Ks
return a cost for the mathematical program that is essentially a large linear combination of the K matrices
One side note is I am not sure of the most tractable way to compute an inverse symbolically, but I am most concerned with my methodology and whether this symbolic description is appropriate.
I think there are several details on DIRTREL worth discussion:
The cost-to-go matrix S[n] depends on the linearized dynamics Ai, Bi. I think in DIRTREL you will need to solve a nonlinear optimization problem, which requires the gradient of the cost. So to compute the gradient of of your cost, you will need the gradient of S[n], which requires the gradient of Ai, Bi. Since Ai and Bi are gradient of the dynamics function f(x, u), you will need to compute the second order gradient of the dynamics.
We had a paper on doing trajectory optimization and optimizing the cost function related to the LQR cost-to-go. DIRTREL made several improvement upon our paper. In our implementation, we treated S also as a decision variable, so our decision variables are x, u, S, with the constraint include both the dynamics constraint x[n+1] = f(x[n], u[n]), and the Riccati equation as constraint on S. I think DIRTREL's approach scales better with less decision variables, but I haven't compared the numerical performance between the two approaches.
I am not sure why you need to compute the inverse symbolically. First what is the inverse you need to compute? And second, Drake supports using automatic differentiation to compute the gradient in the numerical value. I would recommend doing numerical computation instead of symbolic computation. Since in numerical optimization, you only need the value and gradient of the cost/constraints, it is usually much more efficient to compute these values numerically, rather than first deriving the symbolic expression, and then evaluating the symbolic expression.
I want to find the n-dimensional point (x1...xn) in integer space that satisfies some properties, while also maximizing the minimum distance between x and any element of a collection of m (pre-defined/constant) n-dimensional points (z11...z1n, z21...z2n... zm1...zmn). Is there a way to do this using Z3?
Sure. See: https://rise4fun.com/Z3/tutorial/optimization
The above link talks about the SMTLib interface, but the same is also available from the Python interface as well. (And from most other bindings to Z3.)
Note that optimization is largely for linear properties. If you have non-linear terms, you might want to formulate them so that a linear-counter-part can be optimized instead. Even with non-linear terms, you might get good results, impossible to know without trying.
In Ordinary Least Square Estimation, the assumption is for the Samples matrix X (of shape N_samples x N_features) to have "full column rank".
This is apparently needed so that the linear regression can be reduced to a simple algebraic equation using the Moore–Penrose inverse. See this section of the Wikipedia article for OLS:
https://en.wikipedia.org/wiki/Ordinary_least_squares#Estimation
In theory this means that if all columns of X (i.e. features) are linearly independent we can make an assumption that makes OLS simple to calculate, correct?
What does this mean in practice?
Does this mean that OLS is not calculable and will result in an error for such input data X? Or will the result just be bad?
Are there any classical datasets for which linear regression fails due to this assumption not being true?
The full rank assumption is only needed if you were to use the inverse (or cholesky decomposition, or QR or any other method that is (mathematically) equivalent to computing the inverse). If you use the Moore-Penrose inverse you will still compute an answer. When the full rank assumtion is violated there is no longer a unique answer, ie there are many x that minimise
||A*x-b||
The one you will compute with the Moore-Penrose will be the x of minimum norm. See here, for exampleA
what is the usual precision for Real variables in Z3? Is exact arithmetic used?
Is there a way to set the accuracy level manually?
If Real means that exact arithmetic must be used, is there any other data type for floating point values which has limited precision?
Finally: from this point of view, is z3 different with respect to the other popular SMT solvers, or is this standardised in the SMT-LIB definition?
See this answer: z3 existential theory of the reals
Regarding printing precision, see this one: algebraic reals: does z3 do rounding when pretty printing?
In short, yes they are precisely represented as roots of polynomials. Not every real number can be represented by the Real type (transcendentals, e, pi, etc.); but all polynomial roots are representable.
This paper discusses how to also deal with transcendentals.
I'm studying Markov Random Fields, and, apparently, inference in MRF is hard / computationally expensive. Specifically, Kevin Murphy's book Machine Learning: A Probabilistic Perspective says the following:
"In the first term, we fix y to its observed values; this is sometimes called the clamped term. In the second term, y is free; this is sometimes called the unclamped term or contrastive term. Note that computing the unclamped term requires inference in the model, and this must be done once per gradient step. This makes training undirected graphical models harder than training directed graphical models."
Why are we performing inference here? I understand that we're summing over all y's, which seems expensive, but I don't see where we're actually estimating any parameters. Wikipedia also talks about inference, but only talks about calculating the conditional distribution, and needing to sum over all non-specified nodes.. but.. that's not what we're doing here, is it?
Alternatively, any have good intuition on why inference in MRF is difficult?
Sources:
Chapter 19 of ML:PP: https://www.cs.ubc.ca/~murphyk/MLbook/pml-print3-ch19.pdf
Specific section seen below
When training your CRF, you want to estimate your parameters, \theta.
In order to do this, you can differentiate your loss function (Equation 19.38) with respect to \theta, set it to 0, and solve for \theta.
You can't analytically solve the equation for \theta if you do this though. You can, however, minimise Equation 19.38 by gradient descent. Since the loss function is convex, it is guaranteed that gradient descent will get you the globally optimal solution when it converges.
Equation 19.41 is the actual gradient which you need to compute in order to be able to do gradient descent. The first term is easy (and computationally cheap) to compute as you are summing up over the observed values of y. However, the second term requires you to do inference. In this term, you are not summing up over the observed value of y as in the first term. Instead, you need to compute the configuration of y (inference), and then calculate the value of the potential function under this configuration.