How to identify variables introduced by the Tseitin encoding in z3? - z3

I need to count the number of solutions of a bitvector theory. I would like first to bit-blast and then call a (propositional) model counter on the resulting CNF. However, in order for the count to be equal to the number of solutions of the original theory, I have to perform the so called projected model counting (due to the added Tseitin variables). The problem is that I haven't been able to identify the correct subset of variables (those that are not added by the Tseitin encoding) that is required for this task. This is what I'm doing at the moment:
F = z3.parse_smt2_file(inst)
g = Goal()
g.add(F)
t = Then('simplify', 'bit-blast')
subgoal = t(g)
vars = z3_util.get_vars(subgoal.as_expr())
t = Tactic('tseitin-cnf')
subgoal = t(subgoal.as_expr())
print_cnf(subgoal)
Where 'vars' is the subset of variables that I need. However, when I print the CNF to a file and I run a tool performing projected model counting using those variables, the number of models returned is not correct. Any idea on how to get the correct subset of variables? (i.e how to exclude the Tseitin variables)

Related

Mean difference analysis for several variables at once

Is it possible to do mean difference analysis for several variables at once in R? But not for a couple of variables?
can i do a mean comparison for several variables at the same time? code in RStudio: wilcox.test(var1, var2, var3 ~ group, data = table1, paired = FALSE)
example dataset : var1,var2,var3 - quantitative variables; group - nominal variable.
in the end I want to get a table of mean differences with p-levels (similar to SPSS)
table1 for mean difference analysis

Can I search through and compare commonly named variables in SPSS?

I have a list of about 30 variables, all named something like test_1, test_2, test_3, etc. I need to check if the values are all the same, and typically do so by exporting to excel and using an if statement comparing the min value to the max (i.e. if the min=max then all the values are the same).
Is there a way I can do this right in SPSS without having to export? It seems inefficient to compare if test_1=test_2 and test_2=test_3 etc.
This is sort of a hack, but it get's the job done: can calculate the standard deviation of all your variables:
compute sd_test=SD(test_1, test_2, ..., test_n).
EXECUTE.
sd_test=0 for records where all test_i variables are equal.

How do I speedup adding two big vectors of tuples?

Recently, I am implementing an algorithm from a paper that I will be using in my master's work, but I've come across some problems regarding the time it is taking to perform some operations.
Before I get into details, I just want to add that my data set comprehends roughly 4kk entries of data points.
I have two lists of tuples that I've get from a framework (annoy) that calculates cosine similarity between a vector and every other vector in the dataset. The final format is like this:
[(name1, cosine), (name2, cosine), ...]
Because of the algorithm, I have two of that lists with the same names (first value of the tuple) in it, but two different cosine similarities. What I have to do is to sum the cosines from both lists, and then order the array and get the top-N highest cosine values.
My issue is: is taking too long. My actual code for this implementation is as following:
def topN(self, user, session):
upref = self.m2vTN.get_user_preference(user)
spref = self.sm2vTN.get_user_preference(session)
# list of tuples 1
most_su = self.indexer.most_similar(upref, len(self.m2v.wv.vocab))
# list of tuples 2
most_ss = self.indexer.most_similar(spref, len(self.m2v.wv.vocab))
# concat both lists and add into a dict
d = defaultdict(int)
for l, v in (most_ss + most_su):
d[l] += v
# convert the dict into a list, and then sort it
_list = list(d.items())
_list.sort(key=lambda x: x[1], reverse=True)
return [x[0] for x in _list[:self.N]]
How do I make this code faster? I've tried using threads but I'm not sure if it will make it faster. Getting the lists is not the problem here, but the concatenation and sorting is.
Thanks! English is not my native language, so sorry for any misspelling.
What do you mean by "too long"? How large are the two lists? Is there a chance your model, and interim results, are larger than RAM and thus forcing virtual-memory paging (which would create frustrating slowness)?
If you are in fact getting the cosine-similarity with all vectors in the model, the annoy-indexer isn't helping any. (Its purpose is to get a small subset of nearest-neighbors much faster, at the expense of perfect accuracy. But if you're calculating the similarity to every candidate, there's no speedup or advantage to using ANNOY.
Further, if you're going to combine all of the distances from two such calculation, there's no need for the sorting that most_similar() usually does - it just makes combining the values more complex later. For the gensim vector-models, you can supply a False-ish topn value to just get the unsorted distances to all model vectors, in order. Then you'd have two large arrays of the distances, in the model's same native order, which are easy to add together elementwise. For example:
udists = self.m2v.most_similar(positive=[upref], topn=False)
sdists = self.m2v.most_similar(positive=[spref], topn=False)
combined_dists = udists + sdists
The combined_dists aren't labeled, but will be in the same order as self.m2v.index2entity. You could then sort them, in a manner similar to what the most_similar() method itself does, to find the ranked closest. See for example the gensim source code for that part of most_similar():
https://github.com/RaRe-Technologies/gensim/blob/9819ce828b9ed7952f5d96cbb12fd06bbf5de3a3/gensim/models/keyedvectors.py#L557
Finally, you might not need to be doing this calculation yourself at all. You can provide more-than-one vector to most_similar() as the positive target, and then it will return the vectors closest to the average of both vectors. For example:
sims = self.m2v.most_similar(positive=[upref, spref], topn=len(self.m2v))
This won't be the same value/ranking as your other sum, but may behave very similarly. (If you wanted less-than-all of the similarities, then it might make sense to use the ANNOY indexer this way, as well.)

Z3 getting last valid model

I using Z3 C++ api to find a satisfiable formula that is minimal with respect to some boolean variables (let us call them b0,...,bn) being true.
I have a formula that includes boolean variables b0,...,bn and I want to find some satisfiable formula where I have the least number of b0,...,bn set to true.
I do this by initially finding a subset of b0,...,bn that can be assigned to true and satisfy my formula, and I incrementally ask the solver to find smaller subsets (i.e. where one of these boolean variables is flipped to false).
I find my local minimum when I cannot find a smaller subset, i.e. I get a unsat result from the Z3. At this point, I would like to access the last valid model.
Is that possible? Does Z3 modify the model when a call to "check" is unsat?
If so, how can I do this using the C++ api?
Many thanks in advance,
You can retrieve a model if the solver returns "sat". The model refers to the state of the solver, so if you add assertions, the state changes and models are no longer valid until you check satisfiability and it returns sat.
So you can retrieve a model every time the solver returns SAT, and then discharge all but the last model.
As Nikolaj mentioned, you need to keep track of models after each call that results in sat and return the last one when you get an unsat if you follow the strategy you outlined.
However, there might be another alternative that avoids repeated calls altogether. Instead of a satisfaction problem, you can cast your problem as an optimization one. You mentioned you have control variables b0, b1, .. bn such that you want to minimize the number of them getting set to true for a satisfying model. Create a metric that counts the number of ones in these variables. Something like:
metric = (if b0 then 1 else 0)
+ (if b1 then 1 else 0)
+ ...
+ (if bn then 1 else 0)
Then use Z3's optimization routines to minimize metric. I believe this will provide you with the solution you are looking for in one call only.
Some helpful references:
Here's the Z3 optimization tutorial: http://rise4fun.com/z3opt/tutorialcontent/guide.
This example, in particular, talks about soft-constraints, and might quite be applicable in your case as well: http://rise4fun.com/z3opt/tutorialcontent/guide#h25.
Here's the C++ API reference for the optimizer: http://z3prover.github.io/api/html/classz3_1_1optimize.html.

Getting length of vector in SPSS

I have an sav file with plenty of variables. What I would like to do now is create macros/routines that detect basic properties of a range of item sets, using SPSS syntax.
COMPUTE scale_vars_01 = v_28 TO v_240.
The code above is intended to define a range of items which I would like to observe in further detail. How can I get the number of elements in the "array" scale_vars_01, as an integer?
Thanks for info. (as you see, the SPSS syntax is still kind of strange to me and I am thinking about using Python instead, but that might be too much overhead for my relatively simple purposes).
One way is to use COUNT, such as:
COUNT Total = v_28 TO v_240 (LO THRU HI).
This will count all of the valid values in the vector. This will not work if the vector contains mixed types (e.g. string and numeric) or if the vector has missing values. An inefficient way to get the entire count using DO REPEAT is below:
DO IF $casenum = 1.
COMPUTE Total = 0.
DO REPEAT V = v_28 TO V240.
COMPUTE Total = Total + 1.
END REPEAT.
ELSE.
COMPUTE Total = LAG(Total).
END IF.
This will work for mixed type variables, and will count fields with missing values. (The DO IF would work the same for COUNT, this forces a data pass, but for large datasets and large lists will only evaluate for the first case.)
Python is probably the most efficient way to do this though - and I see no reason not to use it if you are familiar with it.
BEGIN PROGRAM.
import spss
beg = 'X1'
end = 'X10'
MyVars = []
for i in xrange(spss.GetVariableCount()):
x = spss.GetVariableName(i)
MyVars.append(x)
len = MyVars.index(end) - MyVars.index(beg) + 1
print len
END PROGRAM.
Statistics has a built-in macro facility that could be used to define sets of variables, but the Python apis provide much more powerful ways to access and use the metadata. And there is an extension command SPSSINC SELECT VARIABLES that can define macros based on variable metadata such as patterns in names, measurement level, type, and other properties. It generates a macro listing these variables that can then be used in standard syntax.

Resources