Sum of all the bits in a Bit Vector of Z3 - z3

Given a bit vector in Z3, I am wondering how can I sum up each individual bit of this vector?
E.g.,
a = BitVecVal(3, 2)
sum_all_bit(a) = 2
Is there any pre-implemented APIs/functions that support this? Thank you!

It isn't part of the bit-vector operations.
You can create an expression as follows:
def sub(b):
n = b.size()
bits = [ Extract(i, i, b) for i in range(n) ]
bvs = [ Concat(BitVecVal(0, n - 1), b) for b in bits ]
nb = reduce(lambda a, b: a + b, bvs)
return nb
print sub(BitVecVal(4,7))
Of course, log(n) bits for the result will suffice if you prefer.

The page:
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
has various algorithms for counting the bits; which can be translated to Z3/Python with relative ease, I suppose.
My favorite is: https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
which has the nice property that it loops as many times as there are set bits in the input. (But you shouldn't extrapolate from that to any meaningful complexity metric, as you do arithmetic in each loop, which might be costly. The same is true for all these algorithms.)
Having said that, if your input is fully symbolic, you can't really beat the simple iterative algorithm, as you can't short-cut the iteration count. Above methods might work faster if the input has concrete bits.

So you're computing the Hamming Weight of a bit vector. Based on a previous question I had, one of the developers had this answer. Based on that original answer, this is how I do it today:
def HW(bvec):
return Sum([ ZeroExt(int(ceil(log2(bvec.size()))), Extract(i,i,bvec)) for i in range(bvec.size())])

Related

How to cut down train error for a dense-matrix factorization task?

This problem may seem very different from the normal Matrix Factorization task which is widely used in recommender system.
My problem is described as below:
Given a dense Matrix M
(approximately 55000*200, may contain much negative elements, 0.1< abs(M[i][j]) <1 )
I have to find two matrix A(55000*1400) and B(1400*200), such that:
AB=M
However, we have some knowledge about A. We have another Matrix C, if C[i][j] = 0, then A[i][j] must be zero, otherwise it can be any value(C[i][j] = 1).
In my practice , I use machine learning to solve the problem, my loss function is:
||(A*C)(element-wise product) x B - M ||(2)(L2 norm)
I have tried adagrad,momentum,adadelta and some other optimization method, but the train error is pretty much and is cut down slowly (learning_rate = 0.1)
UP1:
Well, actually I've got a machine with 32GB memory and I only need 2 min for each epoch. I decompose an element in M only if its corresponding element in C is anotated as 1. Practically , I only decompose M[i][j] when C[i][j] = 1, and after I decompose M[i][j], I solve the gradient for M[i][j] to update A[i : ] and B[ : j] at once. So, the batch I used is too small--just contain one element. Also , I have to mention that C is a pretty sparse matrix. For each line in C, there is only 2-3 elements that are anotated as 1.
After struggling with it for nearly half month, I finally got the answer: I should update the matrix A much more quickly, say, update the parameters at a more smaller step. I originally updated every element in A only once per epoch, much less than B. However, after I changed the code to let A be updated at the same speed as B, then surprise happened: it worked pretty well!
Maybe smaller steps will help SGD work better? I don't really believe it mathematically.

Fast Exact Solvers for Chromatic Number

Finding the chromatic number of a graph is an NP-Hard problem, so there isn't a fast solver 'in theory'. Is there any publicly available software that can compute the exact chromatic number of a graph quickly?
I'm writing a Python script that computes the chromatic number of many graphs, but it is taking too long for even small graphs. The graphs I am working with a wide range of graphs that can be sparse or dense but usually less than 10,000 nodes. I formulated the problem as an integer program and passed it to Gurobi to solve. Do you have recommendations for software, different IP formulations, or different Gurobi settings to speed this up?
import networkx as nx
from gurobipy import *
# create test graph
n = 50
p = 0.5
G = nx.erdos_renyi_graph(n, p)
# compute chromatic number -- ILP solve
m = Model('chrom_num')
# get maximum number of variables necessary
k = max(nx.degree(G).values()) + 1
# create k binary variables, y_0 ... y_{k-1} to indicate whether color k is used
y = []
for j in range(k):
y.append(m.addVar(vtype=GRB.BINARY, name='y_%d' % j, obj=1))
# create n * k binary variables, x_{l,j} that is 1 if node l is colored with j
x = []
for l in range(n):
x.append([])
for j in range(k):
x[-1].append(m.addVar(vtype=GRB.BINARY, name='x_%d_%d' % (l, j), obj=0))
# objective function is minimize colors used --> sum of y_0 ... y_{k-1}
m.setObjective(GRB.MINIMIZE)
m.update()
# add constraint -- each node gets exactly one color (sum of colors used is 1)
for u in range(n):
m.addConstr(quicksum(x[u]) == 1, name='NC_%d' % u)
# add constraint -- keep track of colors used (y_j is set high if any time j is used)
for u in range(n):
for j in range(k):
m.addConstr(x[u][j] <= y[j], name='SH_%d_%d' % (u,j))
# add constraint -- adjacent nodes have different colors
for u in range(n):
for v in G[u]:
if v > u:
for j in range(k):
m.addConstr(x[u][j] + x[v][j] <= 1, name='ADJ_%d_%d_COL_%d' % (u,v,j))
# update model, solve, return the chromatic number
m.update()
m.optimize()
chrom_num = m.objVal
I am looking to compute exact chromatic numbers although I would be interested in algorithms that compute approximate chromatic numbers if they have reasonable theoretical guarantees such as constant factor approximation, etc.
You might want to try to use a SAT solver or a Max-SAT solver. I expect that they will work better than a reduction to an integer program, since I think colorability is closer to satsfiability.
SAT solvers receive a propositional Boolean formula in Conjunctive Normal Form and output whether the formula is satisfiable. The following problem COL_k is in NP:
Input: Graph G and natural number k.
Output: G is k-colorable.
To solve COL_k you encode it as a propositional Boolean formula with one propositional variable for each pair (u,c) consisting of a vertex u and a color 1<=c<=k. You need to write clauses which ensure that every vertex is is colored by at least one color. You also need clauses to ensure that each edge is proper.
Then you just do a binary search to find the value of k such that G is k-colorable but not (k-1)-colorable.
There are various free SAT solvers. I have used Lingeling successfully, but you can find many others on the SAT competition website. They all use the same input and output format. Google "MiniSAT User Guide: How to use the MiniSAT SAT Solver" for an explanation on this format.
You can also use a Max-SAT solver, again consult the Max-SAT competition website. They can solve the Partial Max-SAT problem, in which clauses are partitioned into hard clauses and soft clauses. Here, the solver finds the maximal number of soft clauses which can be satisfied while also satisfying all of the hard clauses, see the input format in the Max-SAT competition website (under rules->details).
You can formulate the chromatic number problem as one Max-SAT problem (as opposed to several SAT problems as above). In this sense, Max-SAT is a better fit. On the other hand, I have the impression that SAT solvers generally perform better than Max-SAT solvers. I don't have any experience with this kind of solver, so cannot say anything more.

Naive Bayes Confusion;

I'm working on homework for my machine learning course and am having trouble understanding the question on Naive Bayes. The problem I have is a variation of question number 2 on the following page:
https://www.cs.utexas.edu/~mooney/cs343/hw3-old/hw3.html
The numbers I have are slightly different, so I'll replace the numbers from my assignment with the example above. I'm currently attempting to figure out the probability that the first text is physics. To do so, I have something that looks a little like this:
P(physics|c) = P(physics) * P(carbon|physics) * p(atom|physics) * p(life|physics) * p(earth|physics) / [SOMETHING]
P(physics|c) = .35 * .005 * .1 * .001 * .005 / [SOMETHING]
I'm basing this off of an example that I've seen in my notes, but I can't seem to figure out what I'm supposed to divide by. I'll provide the example from the notes as well.
Perhaps I'm going about this in the wrong way, but I'm unsure where the P(X) term that we're dividing by is coming from. How does this relate to the probability that the text is physics? I feel that getting this issue resolved will make the remainder of the assignment simple.
The denominator P(X) is just the sum of P(X|Y)*P(Y) for all possible classes.
Now, it's important to note that in Naive Bayes, you do not have to compute this P(X). You only have to compute P(X|Y)*P(Y) for each class, and then select the class that produced the highest probability.
In your case, I assume you must have several classes. You mentioned physics, but there must be others like chemistry or math.
So you can compute:
P(physics|X) = P(X|physics) * P(physics) / P(X)
P(chemistry|X) = P(X|chemistry) * P(chemistry) / P(X)
P(math|X) = P(X|math) * P(math) / P(X)
P(X) is the sum of P(X|Y)*P(Y) for all classes:
P(X) = P(X|physics)*P(physics) + P(X|chemistry)*P(chemistry) + P(X|math)*P(math)
(By the way, the above statement is exactly analogous to the example in the image that you provided. The equations are a bit complicated there, but if you rearrange them, you will find that P(X) = P(X|positive)*P(positive) + P(X|negative)*P(negative) in that example).
To produce the answer (that is, to determine Y among physics, chemistry, or math), you would select the maximum value among P(physics|X), P(chemistry|X), and P(math|X).
As I mentioned, you do not need to compute P(X) because this term exists in the denominator of all of P(physics|X), P(chemistry|X), and P(math|X). Thus, you only need to determine the max among P(X|physics)*P(physics), P(X|chemistry)*P(chemistry), and P(X|math)*P(math).
The point is that you don't really need a value for P(x) because it is the same among all classes. So you should ignore it and just compare the numbers before the division step. The highest number is the predicted class.
The reason it is in the equation is originating from the Bayes rule:
P(C1|X) = P(X|C1) * P(C1) / P(X)

why a good choice of mod is "a prime not too close to an exact of 2"

To generate a hash function, Map a key k into one of m slots by taking the remainder of k divided by m. That is, the hash function is
h(k) = k mod m.
I have read at several places that a good choice of m will be
A prime - I understand that we want to remove common factors, hence a prime number is chosen
not too close to an exact power of 2 - why is that?
From Introduction to algorithms :
When using the division method we avoid certain values of m. For
example m should not be power of 2. Since if m=2^p then h(k) is p
lowest-order bits of k. Unless it is known that all low-order p-bit
patterns are equally likely,
it is better to make a hash function
depend on all bits of the key.
As you se from the below image if i chose 2^3 which mean p=3 and m=8. The hashed keys are only dependent to lowest 3(p) bits which is bad because when you hash you want to include as much data as possible for a good distribution.

OpenCV: In FLANN module KDTree constructor creates 4 trees of same size. why?

In FLANN module the KDTree constructor takes configuration params for creating trees. I see the default value is 4. Can someone please suggest why 4 or why more than one trees are required for nearest neighbor search ?
/**
* KDTree constructor
*
* Params:
* inputData = dataset with the input features
* params = parameters passed to the kdtree algorithm
*/
KDTreeIndex(const Matrix<ElementType>& inputData, const IndexParams& params = KDTreeIndexParams(),
Distance d = Distance() ) :
dataset_(inputData), index_params_(params), distance_(d)
{
size_ = dataset_.rows;
veclen_ = dataset_.cols;
trees_ = get_param(index_params_,"trees",4); <<<<------------------ default 4
tree_roots_ = new NodePtr[trees_];
// Create a permutable array of indices to the input vectors.
vind_.resize(size_);
for (size_t i = 0; i < size_; ++i) {
I guess the trees aren't searched till the end; they also vary slightly in the way they constructed. So when 4 trees are partially searched, their combined performance is much better than a single tree partially searched. At the same time they work faster (and possibly more memory efficiently) than a single tree fully searched. At least this is my intuition about this issue.
FLANN - stands for a fast library for approximate nearest neighbours. The word approximate can be a key here. For example, at some point in tree construction one has to find a median of a set of points (to split space efficiently on approximately equal halves). This takes n*log(n) operations but median can be approximated from a smaller sample, say, n=min(100, n/100) - I am making this up to give an example. This will result in a speed-up of approximately x600 for construction at a small price of a little slower search. Also, instead of exploring all dimensions for greater variance to choose a split, we may look at a limited set. This explains why trees can be different.
Another aspect here, approximation to nearest neighbour becomes more important in higher dimensional spaces since standard kd-tree looses its efficiency compared to an exhaustive search. One way to approximate is to inspect only a limited number of leaves to satisfy a search precision. Often times a priority queue is maintained over all the trees to order search by increasing distance. In sum, speed and memory efficiency can be greatly improved with approximation and multiple trees with only little loss in precision.

Resources