Speed up Z3 to CNF conversion - z3

I have predicate that operates with 32bit numbers and I want to convert it to CNF.
I'm using Z3py for that the following way:
value = compute_value(BitVec("x", 32))
threshold = compute_threshold(BitVec("y", 32))
is_valid = value < threshold
g = Goal()
g.add(is_valid)
t = Then('simplify', 'bit-blast', 'tseitin-cnf')
result = t(g)
print(result)
Listings of compute_value and compute_threshold are few hundreds lines of code. I don't think it's reasonable to show them. Output of both is also 32bit BitVec.
I also tried to use tseitin-cnf-core and elim-and tactics, but I always have to break computation because Z3 consumes all my RAM.
That seems strange. As far as I know tseitin conversion requires only O(n) time and space. I suspect Z3 performs unnecessary simplifications of formula which requires so many resources.
What tactics should I use to speed up? Or maybe you know tool better than Z3 to convert predicate with arithmetic expressions to CNF?

Related

Obtaining representation of SMT formula as SAT formula

I've come up with an SMT formula in Z3 which outputs one solution to a constraint solving problem using only BitVectors and IntVectors of fixed length. The logic I use for the IntVectors is only simple Presburger arithmetic (of the form (x[i] - x[i + 1] <=/>= z) for some x and z). I also take the sum of all of the bits in the bitvector (NOT the binary value), and set that value to be within a range of [a, b].
This works perfectly. The only problem is that, as z3 works by always taking the easiest path towards determining satisfiability, I always get the same answer back, whereas in my domain I'd like to find a variety of substantially different solutions (I know for a fact that multiple, very different solutions exist). I'd like to use this nifty tool I found https://bitbucket.org/kuldeepmeel/weightgen, which lets you uniformly sample a constrained space of possibilities using SAT. To use this though, I need to convert my SMT formula into a SAT formula.
Do you know of any resources that would help me learn how to perform Presburger arithmetic and adding the bits of a bitvector as a SAT instance? Alternatively, do you know of any SMT solver which as an intermediate step outputs a readable description of the problem as a SAT instance?
Many thanks!
[Edited to reflect the fact that I really do need the uniform sampling feature.]

Fast Exact Solvers for Chromatic Number

Finding the chromatic number of a graph is an NP-Hard problem, so there isn't a fast solver 'in theory'. Is there any publicly available software that can compute the exact chromatic number of a graph quickly?
I'm writing a Python script that computes the chromatic number of many graphs, but it is taking too long for even small graphs. The graphs I am working with a wide range of graphs that can be sparse or dense but usually less than 10,000 nodes. I formulated the problem as an integer program and passed it to Gurobi to solve. Do you have recommendations for software, different IP formulations, or different Gurobi settings to speed this up?
import networkx as nx
from gurobipy import *
# create test graph
n = 50
p = 0.5
G = nx.erdos_renyi_graph(n, p)
# compute chromatic number -- ILP solve
m = Model('chrom_num')
# get maximum number of variables necessary
k = max(nx.degree(G).values()) + 1
# create k binary variables, y_0 ... y_{k-1} to indicate whether color k is used
y = []
for j in range(k):
y.append(m.addVar(vtype=GRB.BINARY, name='y_%d' % j, obj=1))
# create n * k binary variables, x_{l,j} that is 1 if node l is colored with j
x = []
for l in range(n):
x.append([])
for j in range(k):
x[-1].append(m.addVar(vtype=GRB.BINARY, name='x_%d_%d' % (l, j), obj=0))
# objective function is minimize colors used --> sum of y_0 ... y_{k-1}
m.setObjective(GRB.MINIMIZE)
m.update()
# add constraint -- each node gets exactly one color (sum of colors used is 1)
for u in range(n):
m.addConstr(quicksum(x[u]) == 1, name='NC_%d' % u)
# add constraint -- keep track of colors used (y_j is set high if any time j is used)
for u in range(n):
for j in range(k):
m.addConstr(x[u][j] <= y[j], name='SH_%d_%d' % (u,j))
# add constraint -- adjacent nodes have different colors
for u in range(n):
for v in G[u]:
if v > u:
for j in range(k):
m.addConstr(x[u][j] + x[v][j] <= 1, name='ADJ_%d_%d_COL_%d' % (u,v,j))
# update model, solve, return the chromatic number
m.update()
m.optimize()
chrom_num = m.objVal
I am looking to compute exact chromatic numbers although I would be interested in algorithms that compute approximate chromatic numbers if they have reasonable theoretical guarantees such as constant factor approximation, etc.
You might want to try to use a SAT solver or a Max-SAT solver. I expect that they will work better than a reduction to an integer program, since I think colorability is closer to satsfiability.
SAT solvers receive a propositional Boolean formula in Conjunctive Normal Form and output whether the formula is satisfiable. The following problem COL_k is in NP:
Input: Graph G and natural number k.
Output: G is k-colorable.
To solve COL_k you encode it as a propositional Boolean formula with one propositional variable for each pair (u,c) consisting of a vertex u and a color 1<=c<=k. You need to write clauses which ensure that every vertex is is colored by at least one color. You also need clauses to ensure that each edge is proper.
Then you just do a binary search to find the value of k such that G is k-colorable but not (k-1)-colorable.
There are various free SAT solvers. I have used Lingeling successfully, but you can find many others on the SAT competition website. They all use the same input and output format. Google "MiniSAT User Guide: How to use the MiniSAT SAT Solver" for an explanation on this format.
You can also use a Max-SAT solver, again consult the Max-SAT competition website. They can solve the Partial Max-SAT problem, in which clauses are partitioned into hard clauses and soft clauses. Here, the solver finds the maximal number of soft clauses which can be satisfied while also satisfying all of the hard clauses, see the input format in the Max-SAT competition website (under rules->details).
You can formulate the chromatic number problem as one Max-SAT problem (as opposed to several SAT problems as above). In this sense, Max-SAT is a better fit. On the other hand, I have the impression that SAT solvers generally perform better than Max-SAT solvers. I don't have any experience with this kind of solver, so cannot say anything more.

Sum of all the bits in a Bit Vector of Z3

Given a bit vector in Z3, I am wondering how can I sum up each individual bit of this vector?
E.g.,
a = BitVecVal(3, 2)
sum_all_bit(a) = 2
Is there any pre-implemented APIs/functions that support this? Thank you!
It isn't part of the bit-vector operations.
You can create an expression as follows:
def sub(b):
n = b.size()
bits = [ Extract(i, i, b) for i in range(n) ]
bvs = [ Concat(BitVecVal(0, n - 1), b) for b in bits ]
nb = reduce(lambda a, b: a + b, bvs)
return nb
print sub(BitVecVal(4,7))
Of course, log(n) bits for the result will suffice if you prefer.
The page:
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
has various algorithms for counting the bits; which can be translated to Z3/Python with relative ease, I suppose.
My favorite is: https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
which has the nice property that it loops as many times as there are set bits in the input. (But you shouldn't extrapolate from that to any meaningful complexity metric, as you do arithmetic in each loop, which might be costly. The same is true for all these algorithms.)
Having said that, if your input is fully symbolic, you can't really beat the simple iterative algorithm, as you can't short-cut the iteration count. Above methods might work faster if the input has concrete bits.
So you're computing the Hamming Weight of a bit vector. Based on a previous question I had, one of the developers had this answer. Based on that original answer, this is how I do it today:
def HW(bvec):
return Sum([ ZeroExt(int(ceil(log2(bvec.size()))), Extract(i,i,bvec)) for i in range(bvec.size())])

z3py: What is a correct way of asserting a constraint of "something does not exist"

I want to assert a constraint of "something must not exist" in z3py. I tried using "Not(Exists(...))". A simple example is as follows. I want to find a assignment for a and b, so that such c does not exist.
from z3 import *
s = Solver()
a = Int('a')
b = Int('b')
c = Int('c')
s.add(a+b==5)
s.add(Not(Exists(c,And(c>0,c<5,a*b+c==10))))
print s.check()
print s.model()
The output is
sat
[b = 5, a = 0]
Which seems to be correct.
But when I write "Not(Exists(...))" constraint in a more complex problem, it would take hours without generating a solution.
I wonder if this is the correct and the most efficient way to assert "not exist" constraint? Or such problems with quantifiers are intrinsically hard to solve by any solver?
The way you wrote that constraint is just fine. And it is not surprising that Z3 (or any other solver) would have a hard time solving such problems as you have both quantifiers and non-linear arithmetic. Such problems are intrinsically hard to solve.
You might look into Z3's nlsat tactic, which might provide some relief here: How does Z3 handle non-linear integer arithmetic?
Or, you can try reals instead of integers, or bit-vectors (i.e., machine integers). Of course, whether you can actually use these types would depend on your problem domain. (Reals will have "fractional" values obviously, and bitvectors are subject to modular-arithmetic.)

Z3: Function Expansion and Encoding in QBVF

I am trying to encode formulas with functions in Z3 and I have an encoding problem. Consider the following example:
f(x) = x + 42
g(x1, x2) = f(x1) / f(x2)
h(x1, x2) = g(x1, x2) % g(x2, x1)
k(x1, x2, x3) = h(x1, x2) - h(x2, x3)
sat( k(y1, y2, y3) == 42 && k(y3, y2, y1) == 42 * 2 && ... )
I would like my encoding to be both efficient (no expression duplication) and allow Z3 to re-use lemmas about functions across subproblems. Here is what I have tried so far:
Inline the functions for every free variable instantiation y1, y2, etc. This introduces duplication and performance is not as good as I hoped for.
Assert the function declarations with universal quantifiers. This works for very specific examples - from the solving times it seems that Z3 can (?) re-use results from previous queries that involve the same functions. However, solving times vary greatly and in many cases (1) turns out to be faster.
Use function definitions (i.e., quantifiers + the MACRO_FINDER option). If my understanding of the documentation is correct, this should expand the functions and thus should be close to (1). However, in terms of performance the results were a bit surprising (">" means faster):
For problems where (1) > (2) I get: (1) > (3) > (2)
For problems where (2) > (1) I get: (2) > (1) = (3)
I have also tried tweaking the MBQI option (and others) with most of the above. However, it is not clear what is the best combination. I am using Z3 4.0.
The question is: What is the "right" way to encode the problem? Note that I only have interpreted functions (I do not really need UF). Could I use this fact for a more efficient encoding and avoid function expansion?
Thanks
I think there's no clear answer to this question. Some techniques work better for one type of benchmarks and other techniques work better for others. For the QBVF benchmarks we've looked at so far, we found macros give us the best combination of small benchmark size and small solving times, but this may not apply in this case.
Your understanding of the documentation is correct, the macro finder will identify quantifiers that look like function definitions and replace all other calls to that function with its definition. It's possible that not all of your macros are picked up or that you are using quasi-macros which aren't detected correctly, either of which could go towards explaining why the performance is sometimes worse than your (1). How much is the difference in the case that (1) > (3)? A little overhead is to be expected, but vast variations in runtime are probably due to some macros being malformed or not being detected.
In general, the is no "right" way to encode these problems. Function expansion can not always be avoided. The trade-off is essentially between expanding eagerly (1, 3), or doing it lazily (2). It may be that there is a correlation of the type SAT (1, 3 faster) and UNSAT (2 faster), but this is also not guaranteed to be the case.

Resources