I have two large collections of Z3 integer variables. Call them collection A and collection B. (Group membership is known in advance, so there's no need to use Z3 sets.) I need to generate assertions that ensure no element in A is equal to an element in B. The obvious way is the following:
for all a in A:
for all b in B:
solver.add(a != b);
However, the collections are large and this would add over 20 million assertions, so it's not an option.
Here's another approach I came up with that only involves assert a total of O(n+m) clauses:
a = ctx.int_const("a");
a_def = (a == A[0] || a == A[1] || ... || A[n]);
b = ctx.int_const("b");
b_def = (b == B[0] || b == B[1] || ... || B[m]);
solver.add(z3::forall(a, b, z3::implies(a_def && b_def, a != b)));
Is there a more efficient way to do this? It seems like the above approach presents the relationship between A and B in an indirect way to the solver, which I worry will hurt performance.
I think your best bet is to directly use Distinct (https://z3prover.github.io/api/html/namespacez3py.html#a9eae89dd394c71948e36b5b01a7f3cd0):
s.add(Distinct(*(A+B)))
And z3 will internally handle this (hopefully!) with a pseudo-boolean encoding and be quite efficient. Note that this is just one clause, though internally z3 will transform it to an efficient form.
Another option, since you're already starting with the assumption that A and B are sets, is to use min(m,n) calls to Distinct:
for a in A:
s.add(Distinct(a, *B))
obviously, do this loop over the shorter list; thus creating min(m,n) assertions.
Your O(m+n) solution can work pretty well too. I don't see any obvious reasons why it shouldn't. (Although, the presence of quantifiers make life hard for SMT solvers. So, your mileage might vary, depending on the other constraints in the system.)
With anything performance related, it's best to do some experiments and see what works the best. I think it'll really depend on what other constraints you have on these elements and how well these new assertions will play with them. Without knowing your entire constraint set, it's anyone's guess.
Related
Disclaimer: This is a rather theoretical question, but think it fits here; in case not, let me know an alternative :)
Z3 seems expressive
Recently, I realized I can specify this type of formulae in Z3:
Exists x,y::Integer s.t. [Exists i::Integer s.t. (0<=i<|seq|) & (avg(seq)+t<seq[i])] & (y<Length(seq)) & (y<x)
Here is the code (in Python):
from z3 import *
#Average function
IntSeqSort = SeqSort(IntSort())
sumArray = RecFunction('sumArray', IntSeqSort, IntSort())
sumArrayArg = FreshConst(IntSeqSort)
RecAddDefinition( sumArray
, [sumArrayArg]
, If(Length(sumArrayArg) == 0
, 0
, sumArrayArg[0] + sumArray(SubSeq(sumArrayArg, 1, Length(sumArrayArg) - 1))
)
)
def avgArray(arr):
return ToReal(sumArray(arr)) / ToReal(Length(arr))
###The specification
t = Int('t')
y = Int('y')
x = Int('x')
i = Int('i') #Has to be declared, even if it is only used in the Existential
seq = Const('seq', SeqSort(IntSort()))
avg_seq = avgArray(seq)
phi_0 = And(2<t, t<10)
phi_1 = And(0 <= i, i< Length(seq))
phi_2 = (t+avg_seq<seq[i])
big_vee = And([phi_0, phi_1, phi_2])
phi = Exists(i, big_vee)
phi_3 = (y<Length(seq))
phi_4 = (y>x)
union = And([big_vee, phi_3, phi_4])
phiTotal = Exists([x,y], union)
s = Solver()
s.add(phiTotal)
print(s.check())
#s.model()
solve(phiTotal) #prettier display
We can see how it outputs sat and outputs models.
But...
However, even if this expressivity is useful (at least for me), there is something I am missing: formalization.
I mean, I am combining first-order theories that have different signature and semantics: a sequence-like theory, an integer arithmetic theory and also a (uninterpreted?) function avg. Thus, I would like to combine these theories with a Nelson-Oppen-like procedure, but this procedure only works with quantifier-free fragments.
I mean, I guess this combined theory is semi-decidable (because of quantifiers and because of sequences), but can we formalize it? In case yes, I would like to (in a correct way) combine these theories, but I have no idea how.
An exercise (as an orientation)
Thus, in order to understand this, I proposed a simpler exercise to myself: take the decidable array property fragment (What's decidable about arrays?http://theory.stanford.edu/~arbrad/papers/arrays.pdf), which has a particular set of formulae and signature.
Now, suppose I want to add an avg function to it. How can I do it?
Do I have to somehow combine the array property fragment with some kind of recursive function theory and an integer theory? How would I do this? Note that these theories involve quantifiers.
Do I have to first combine these theories and then create a decision procedure for the combined theory? How?
Maybe it suffices with creating a decision procedure within the array property fragment?
Or maybe it suffices with a syntactic adding to the signature?
Also, is the theory array property fragment with an avg function still decidable?
A non-answer answer: Start by reading Chapter 10 of https://www.decision-procedures.org/toc/
A short answer: Unless your theory supports quantifier-elimination, SMT solvers won't have a decision-procedure. Assuming they all admit quantifier-elimitation, then you can use Nelson-Oppen. Adding functions like avg etc. do not add significantly to expressive power: They are definitions that are "unfolded" as needed, and so long as you don't need induction, they're more or less conveniences. (This is a very simplified account, of course. In practice, you'll most likely need induction for any interesting property.)
If these are your concerns, it's probably best to move to more expressive systems than push-button solvers. Start looking at Lean: Sure, it's not push-button, but it provides a very nice framework for general purpose theorem proving: https://leanprover.github.io
Even longer answer is possible, but stack-overflow isn't the right forum for it. You're now looking into the the theory of decision procedures and theorem-proving, something that won't fit into any answer in any internet-based forum. (Though https://cstheory.stackexchange.com might be better, if you want to give it a try.)
I have been working with the optimizer in Z3PY, and only using Z3 ints and (x < y)-like constraints in my project. It has worked really well. I have been using up to 26 variables (Z3 ints), and it takes the solver about 5 seconds to find a solution and I have maybe 100 soft constraints, at least. But now I tried with 49 variables, and it does not solve it at all (I shut it down after 1 hour).
So I made a little experiment to find out what was slowing it down, is it the amount of variables or the amount of soft constraints? It seems like the bottle neck is the amount of variables.
I created 26 Z3-ints. Then I added as hard constraints, that it should not be lower than 1 or more than 26. Also, all numbers must be unique. No other constraints was added at all.
In other words, the solution that the solver will find is a simple order [1,2,3,4,5....up to 26]. Ordered in a way that the solver finds out.
I mean this is a simple thing, there are really no constraints except those I mentioned. And the solver solves this in 0.4 seconds or something like that, fast and sufficient. Which is expected. But if I increase the amount of variables to 49 (and of course the constraints now are that it should not be lower than 1 or more than 49), it takes the solver about 1 minute to solve. That seems really slow for such a simple task? Should it be like this, anybody knows? The time complexity is really extremely increased?
(I know that I can use Solver() instead of Optimizer() for this particular experiment, and it will be solved within a second, but in reality I need it to be done with Optimizer since I have a lot of soft constraints to work with.)
EDIT: Adding some code for my example.
I declare an array with Z3 ints that I call "reqs".
The array is consisting of 26 variables in one example and 49 in the other example I am talking about.
solver = Optimize()
for i in (reqs):
solver.add(i >= 1)
for i in (reqs):
solver.add(i <= len(reqs))
d = Distinct(reqs)
solver.add(d)
res = solver.check()
print(res)
Each benchmark is unique, and it's impossible to come up with a good strategy that applies equally well in all cases. But the scenario you describe is simple enough to deal with. The performance problem comes from the fact that Distinct creates too many inequalities (quadratic in number) for the solver, and the optimizer is having a hard time dealing with them as you increase the number of variables.
As a rule of thumb, you should avoid using Distinct if you can. For this particular case, it'd suffice to impose a strict ordering on the variables. (Of course, this may not always be possible depending on your other constraints, but it seems what you're describing can benefit from this trick.) So, I'd code it like this:
from z3 import *
reqs = [Int('i_%d' % i) for i in range(50)]
solver = Optimize()
for i in reqs:
solver.add(i >= 1, i <= len(reqs))
for i, j in zip(reqs, reqs[1:]):
solver.add(i < j)
res = solver.check()
print(res)
print(solver.model())
When I run this, I get:
$ time python a.py
sat
[i_39 = 40,
i_3 = 4,
...
i_0 = 1,
i_2 = 3]
python a.py 0.27s user 0.09s system 98% cpu 0.365 total
which is pretty snippy. Hopefully you can generalize this to your original problem.
I am trying to optimize with Z3py an instance of Set Covering Problem (SCP41) based on minimize.
The results are the following:
Using
(1) I know that Z3 supports optimization (https://rise4fun.com/Z3/tutorial/optimization). Many times I get to the optimum in SCP41 and others instances, a few do not.
(2) I understand that if I use the Z3py API without the optimization module I would have to do the typical sequential search described in (Minimum and maximum values of integer variable) by #Leonardo de Moura. It never gives me results.
My approach
(3) I have tried to improve the sequential search approach by implementing a binary search similar to how it explains #Philippe in (Does Z3 have support for optimization problems), when I run my algorithm it waits and I do not get any result.
I understand that the binary search should be faster and work in this case? I also know that the instance SCP41 is something big and that many restrictions are generated and it becomes extremely combinatorial, this is my full code (Code large instance) and this is my binary search it:
def min(F, Z, M, LB, UB, C):
i = 0
s = Solver()
s.add(F[0])
s.add(F[1])
s.add(F[2])
s.add(F[3])
s.add(F[4])
s.add(F[5])
r = s.check()
if r == sat:
UB = s.model()[Z]
while int(str(LB)) <= int(str(UB)):
C = int(( int(str(LB)) + int(str(UB)) / 2))
s.push()
s.add( Z > LB, Z <= C)
r = s.check()
if r==sat:
UB = Z
return s.model()
elif r==unsat:
LB = C
s.pop()
i = i + 1
if (i > M):
raise Z3Exception("maximum not found, maximum number of iterations was reached")
return unsat
And, this is another instance (Code short instance) that I used in initial tests and it worked well in any case.
What is incorrect binary search or some concept of Z3 not applied correctly?
regards,
Alex
I don't think your problem is to do with minimization itself. If you put a print r after r = s.check() in your program, you see that z3 simply struggles to return a result. So your loop doesn't even execute once.
It's impossible to read through your program since it's really large! But I see a ton of things of the form:
Or(X250 == 0, X500 == 1)
This suggests your variables X250 X500 etc. (and there's a ton of them) are actually booleans, not integers. If that is indeed true, you should absolutely stick to booleans. Solving integer constraints is significantly harder than solving pure boolean constraints, and when you use integers to model booleans like this, the underlying solver simply explores the search space that's just unreachable.
If this is indeed the case, i.e., if you're using Int values to model booleans, I'd strongly recommend modelling your problem to get rid of the Int values and just use booleans. If you come up with a "small" instance of the problem, we can help with modeling.
If you truly do need Int values (which might very well be the case), then I'd say your problem is simply too difficult for an SMT solver to deal with efficiently. You might be better off using some other system that is tuned for such optimization problems.
So I have a variable, let's call it 'ID'. I need to check this value relative to a fixed amount of values. The ID, of course, can only match one of the values so there isn't an issue with stopping on the first matching value as none of the others would match. There is also a chance that the variable does not match any of the given values, too. My question is then, what is the most resource efficient way to do this? I can think of two easy ways of tackling the problem. Since I know the values at the time of programming I can setup a conditional with 'or' that just checks each value, like so:
if (ID == "1" or ID == "16" or ID == "58") then
--do something--
end
The problem with this is that it's quite verbose and tedious to write. The other option involves a foreach loop where I define a table beforehand.
values = {"1", "16", "58"}
for _, value in ipairs(values) do
if(ID == value) then
return true
end
end
The upside to this is it's reusable which is good since I'll need to do this exact check with a different set of values at least 10 times, the downside is I suspect it takes more resources.
Any help would be greatly appreciated.
Tables can be used as sets:
interesting = {
["1"] = true, ["16"] = true, ["58"] = true
}
if interesting[ID] then
-- ...
end
While it eats more memory (80 bytes per empty table plus 32 bytes (IIRC, on x86_64) per entry (while rounding the number of entries up to the next power of two) vs. 16 bytes per comparison plus storage for the value that you compare) the chain of comparisons happens in C and is therefore faster than a chain of comparisons as a sequence of Lua instructions (at least once things get larger).
For small numbers of values, this doesn't really matter. (If you are CPU-bound and this is very important in your case, measure in the context of your program and see what performs better. Don't put too much weight on micro-benchmarks with this – cache behavior in particular might produce funny effects here.)
For large numbers of comparisons, this is the right approach. It's also more flexible than if-then-else chains. (You can change things at runtime without reloading code.)
Also note that the value you use to anchor an element in the set doesn't really matter, so a relatively common idiom (especially for input handling) is putting the action as a function into the table:
keybindings = {
left = function() Player:move_left( ) end,
right = function() Player:move_right( ) end,
up = function() Player:jump( ) end,
-- ...
}
function onKey( k )
local action = keybindings[k]
if action then action( ) end
end
While this certainly is slower than a direct comparison and inline code, speed is essentially irrelevant here (generally happens much less often than ~100x per second) and flexibility is of high value.
I'm new to Z3, so this is likely a silly question. I'm trying to model an execution flow of a program. Doing this through manual z3 calls for now. That said, I end up trying to model something like the following:
x = 1
x += 1
Performing the following commands gives me unsat, and I understand why.
x = z3.Int('x')
s.add(x == 1)
s.add(x == x + 1)
In small scale, it might be reasonable to manually change x == 1 to x == 2. My question is, is there a way to do this in z3 where I don't have to go back and attempt to modify the variables I put into the solver? The equations obviously would get much harrier than just +1, and attempting to work through that logic manually seems error prone and sloppy.
EDIT: After adjusting my program to use SSA as suggested, it works very easily now. I did opt to keep multiple versions of the variable, but that didn't turn into too much extra work.
You can rename variables such that they are in SSA form (https://en.wikipedia.org/wiki/Static_single_assignment_form).
Also, you probably don't need to introduce names for intermediate expressions. The only Z3 variables should be program inputs or things like that.