Binary multiplication and addition in terms of elementary operations (AND,OR,XOR,SHIFT...) - binary-data

Let's assume we are given N1 and N2: q-bit length 2 binary numbers. For simplicity, both are unsigned integers. Can we express multiplication or addition of N1 and N2 in terms of the number of bit-wise operations like AND, OR, XOR, SHIFT needed to perform this operation??? Reasonable estimation would also be fine.
Every information, thoughts, links are highly appreciated.


Quantum computing vs traditional base10 systems

This may show my naiveté but it is my understanding that quantum computing's obstacle is stabilizing the qbits. I also understand that standard computers use binary (on/off); but it seems like it may be easier with today's tech to read electric states between 0 and 9. Binary was the answer because it was very hard to read the varying amounts of electricity, components degrade over time, and maybe maintaining a clean electrical "signal" was challenging.
But wouldn't it be easier to try to solve the problem of reading varying levels of electricity so we can go from 2 inputs to 10 and thereby increasing the smallest unit of storage and exponentially increasing the number of paths through the logic gates?
I know I am missing quite a bit (sorry the puns were painful) so I would love to hear why or why not.
Thank you
"Exponentially increasing the number of paths through the logic gates" is exactly the problem. More possible states for each n-ary digit means more transistors, larger gates and more complex CPUs. That's not to say no one is working on ternary and similar systems, but the reason binary is ubiquitous is its simplicity. For storage, more possible states also means we need more sensitive electronics for reading and writing, and a much higher error frequency during these operations. There's a lot of hype around using DNA (base-4) for storage, but this is more on account of the density and durability of the substrate.
You're correct, though that your question is missing quite a bit - qubits are entirely different from classical information, whether we use bits or digits. Classical bits and trits respectively correspond to vectors like
Binary: |0> = [1,0]; |1> = [0,1];
Ternary: |0> = [1,0,0]; |1> = [0,1,0]; |2> = [0,0,1];
A qubit, on the other hand, can be a linear combination of classical states
Qubit: |Ψ> = α |0> + β |1>
where α and β are arbitrary complex numbers such that such that |α|2 + |β|2 = 1.
This is called a superposition, meaning even a single qubit can be in one of an infinite number of states. Moreover, unless you prepared the qubit yourself or received some classical information about α and β, there is no way to determine the values of α and β. If you want to extract information from the qubit you must perform a measurement, which collapses the superposition and returns |0> with probability |α|2 and |1> with probability |β|2.
We can extend the idea to qutrits (though, just like trits, these are even more difficult to effectively realize than qubits):
Qutrit: |Ψ> = α |0> + β |1> + γ |2>
These requirements mean that qubits are much more difficult to realize than classical bits of any base.

Sum of all the bits in a Bit Vector of Z3

Given a bit vector in Z3, I am wondering how can I sum up each individual bit of this vector?
a = BitVecVal(3, 2)
sum_all_bit(a) = 2
Is there any pre-implemented APIs/functions that support this? Thank you!
It isn't part of the bit-vector operations.
You can create an expression as follows:
def sub(b):
n = b.size()
bits = [ Extract(i, i, b) for i in range(n) ]
bvs = [ Concat(BitVecVal(0, n - 1), b) for b in bits ]
nb = reduce(lambda a, b: a + b, bvs)
return nb
print sub(BitVecVal(4,7))
Of course, log(n) bits for the result will suffice if you prefer.
The page:
has various algorithms for counting the bits; which can be translated to Z3/Python with relative ease, I suppose.
My favorite is:
which has the nice property that it loops as many times as there are set bits in the input. (But you shouldn't extrapolate from that to any meaningful complexity metric, as you do arithmetic in each loop, which might be costly. The same is true for all these algorithms.)
Having said that, if your input is fully symbolic, you can't really beat the simple iterative algorithm, as you can't short-cut the iteration count. Above methods might work faster if the input has concrete bits.
So you're computing the Hamming Weight of a bit vector. Based on a previous question I had, one of the developers had this answer. Based on that original answer, this is how I do it today:
def HW(bvec):
return Sum([ ZeroExt(int(ceil(log2(bvec.size()))), Extract(i,i,bvec)) for i in range(bvec.size())])

Why is modulus operator slow?

Paraphrasing from in "Programming Pearls" book (about c language on older machines, since book is from the late 90's):
Integer arithmetic operations (+, -, *) can take around 10 nano seconds whereas the % operator takes up to 100 nano seconds.
Why there is that much difference?
How does a modulus operator work internally?
Is it same as division (/) in terms of time?
The modulus/modulo operation is usually understood as the integer equivalent of the remainder operation - a side effect or counterpart to division.
Except for some degenerate cases (where the divisor is a power of the operating base - i.e. a power of 2 for most number formats) this is just as expensive as integer division!
So the question is really, why is integer division so expensive?
I don't have the time or expertise to analyze this mathematically, so I'm going to appeal to grade school maths:
Consider the number of lines of working out in the notebook (not including the inputs) required for:
Equality: (Boolean operations) essentially none - in computer "big O" terms this is known a O(1)
addition: two, working left to right, one line for the output and one line for the carry. This is an O(N) operation
long multiplication: n*(n+1) + 2: two lines for each of the digit products (one for total, one for carry) plus a final total and carry. So O(N^2) but with a fixed N (32 or 64), and it can be pipelined in silicon to less than that
long division: unknown, depends upon the argument size - it's a recursive descent and some instances descend faster than others (1,000,000 / 500,000 requires less lines than 1,000 / 7). Also each step is essentially a series of multiplications to isolate the closest factors. (Although multiple algorithms exist). Feels like an O(N^3) with variable N
So in simple terms, this should give you a feel for why division and hence modulo is slower: computers still have to do long division in the same stepwise fashion tha you did in grade school.
If this makes no sense to you; you may have been brought up on school math a little more modern than mine (30+ years ago).
The Order/Big O notation used above as O(something) expresses the complexity of a computation in terms of the size of its inputs, and expresses a fact about its execution time.
O(1) executes in constant (but possibly large) time. O(N) takes as much time as the size of its data-so if the data is 32 bits it takes 32 times the O(1) time of the step to calculate one of its N steps, and O(N^2) takes N times N (N squared) the time of its N steps (or possibly N times MN for some constant M). Etc.
In the above working I have used O(N) rather than O(N^2) for addition since the 32 or 64 bits of the first input are calculated in parallel by the CPU. In a hypothetical 1 bit machine a 32 bit addition operation would be O(32^2) and change. The same order reduction applies to the other operations too.

How much storage to represent sparse matrix

I don`t know how to solve this problem in Fundamentals of data structure in C ed.2nd ch2.5
On a computer with w bits per word, how much storage is needed to represent a sparse matrix, A, with t nonzero terms?
I think the answer is 3*w*t because in sparse matrix we just store row, col and values,
so 3 times w*t but someone says answer is w2 + t.... I don't understand what they mean.
In the most common “general purpose” sparse matrix formats (CSR and CSC), for a matrix with t nonzeros, there are two integer arrays, of lengths t+1 and t, and one array of floating-point numbers of length t. In practice, the size in bytes will depend on the sizes of the integer and floating-point representations. In a theoretical machine with one uniform word size for everything, the size would be 3t+1 words.
I fail to see how w^2+t could be correct or even related.

What is the most efficient(*) way of building a canonical huffman tree?

Assume A is an array where A[0] holds the frequency of 0-th letter of the alphabet.
What is the most efficient(*) way of calculating code lengths? Not sure, but I guess efficiency can be in terms of memory usage or steps required.
All I'm interested is the array L where L[0] contains code lengths (number of bits) of 0-th letter of the alphabet, where code comes from canonical huffman tree built out of A frequency array.
If frequencies form a monotonic sequence, ie. A[0]<=A[1]<=...<=A[n-1] or A[0]>=A[1]>=...>=A[n-1], then you can generate an optimal code lengths in O(n) time and O(1) additional space. This algorithm requires only 2 simple passes over the array and it's very fast. A full description is given in [1].
If your frequencies aren't sorted, first you need to sort them and then apply the above algorithm. In this case time complexity is O(n log n) and an auxiliary array of n integers is needed to store sorted order - space complexity O(n).
In-Place Calculation of Minimum-Redundancy Codes by Alistair Moffat and Jyrki Katajainen, available online:
