How to compute the upper part of a product of two bit-vectors?

How to compute the upper part of a product of two bit-vectors? - z3

How to compute the upper part of a product of two bit-vectors?
I could use vectors of double the width, do the normal multiplication and then right shift, but that seems rather inefficient. Is there a better way? Perhaps split the vectors into four halves, do four full multiplications of the original width and combine only the upper part?
Alternatively, is there a solver which supports such operation natively?

If you want to know the precise value of the upper part, then zero-extend to double-width, multiply, and grab the upper bits. Since the lower parts will impact every bit on the top part, you can't do any better, asymptotically. If you have a very specific use case that looks at a very specific set of bits in the upper part instead of all; then you might have a more efficient algorithm; but you'd have to derive it yourself for each bit you are interested in. I'm not aware of any solver that supports anything custom for this purpose.
If you don't care about the value of the upper part, but you only care if the multiplication overflows or not, then z3 has custom built-in functions for that. See the paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/z3prefix.pdf
The primitives z3 provides are:
bvsmul_noovfl: Signed multiplication no overflow
bvsmul_noudfl: Signed multiplication no underflow
bvumul_noovfl: Unsigned multiplication no overflow
(The paper has details on how you can detect overflow/underflow for other operations we well.)
These operations directly allow you to determine (in an optimal way) if there's overflow/underflow in multiplication. When you don't care about the value of the upper part but merely want to know if an ovrerflow/underflow happened, use these methods. (See the paper cited above for more details. Especially Section 5.3.)

Related

Interpolation vs Average

I'm new of computer vision concepts and I'd like to know why, when we double the size of an image, we should use bilinear interpolation where pixels haven't values instead of average between nearest known values pixels.

I'm not sure I agree with the premise that you "should use bilinear interpolation". You shouldn't blindly use anything without thinking about it. For example, if your pixels represent the result of a classification and 1 represents wheat, and 2 represents water, and 3 represents barley, you certainly shouldn't take the average and assume that when you enlarge an image of wheat and barley that some ocean suddenly appears in the middle between the fields.
Bilinear interpolation is actually just averaging, except a) it is in 2 dimensions because images are inherently 2-dimensional and b) if you know you are nearer to one point than another, surely it isn't unreasonable to weight your "guesstimated" value (which, after all, you don't actually know) more towards the geometrically closer value?
I guess my answer is really that there are several types of interpolation, and you should apply some thinking to deciding which one is best for your particular circumstances. Sometimes you don't want to introduce new colours because of classification or palette issues, and in these circumstances you need "nearest neighbour". Sometimes "bilinear" is what you need, sometimes "bicubic".

Does optimizing a bijection-transformed value affect performance or accuracy?

If I want to optimize a function with respect to some constrained value, I can find a bijective map between an unconstrained space and the constrained space, then optimize the composition of the original function and the bijective map with respect to the unconstrained value.
Does optimizing in a different space affect the performance or accuracy of optimization? And does it vary between bijective maps?
My use case is training constrained Gaussian process model hyperparameters in GPflow using TensorFlow Probability's bijectors.

If I understand you correctly, you might have for example some variable that is constrained to be positive and want to optimize it. And for that you train the variable in the unconstrained space?
That would be pretty common in machine learning where you for example enforce a variance (of let's say the likelihood) to be positive by taking the exponent of the unconstrained value.
I guess the effect on the optimization very much depends on how you optimize it. For gradient based methods it does have an effect, and sometimes small tricks are helpful to improve those issues (e.g. shifting, so that your transformation is tf.exp(shift_val + unconstrained_variable) ).
And yes afaik it varies inbetween different mappings. In my example, the softplus and exponential transformation result in different gradients. Tough I'm not sure if there's a consent on which one is preferable.
I'd just try a few different ones. As long as it doesn't lead to numerical issues, either transformation/bijection should be fine.
EDIT: just to clarify. The bijection should not affect the solution space, just the optimization path itself.

How to defend thresholding technique

On a job for a customer, I am locating items within a grayscale scene with nonuniform background illumination. Once the items are located, I need to do another search within each one for details. The items are easy enough to locate by masking with the output of a variance filter; and within the items, if the threshold is correct, the details are easy to locate as well. But the mean and contrast of these items varies substantially.
I played around with threshold calculation for a while, and none of the techniques I implemented is perfect; but the one that turns out simplest, as accurate as any other, and quite low cost, is to take the mean pixel value and add one standard deviation.
My question is: is there some analytical way to defend this calculation other than "it works well"? I mean, I did sort of fall on this technique accidentally (only later did I find this answer), and using it seems arbitrary.

Homography and projective transformation

im trying to write a code that will do projective transformation, but with more than 4 key points. i found this helpful guide but it uses 4 points of reference
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
i know that matlab uses has a function tcp2form that handles that, but i haven't found a way so far.
anyone can give me some guidance, on how to do so? i can solve the equations using (least squares), but i'm stuck since i have a matrix that is larger than 3*3 and i can't multiple the homogeneous coordinates.
Thanks

If you have more than four control points, you have an overdetermined system of equations. There are two possible scenarios. Either your points are all compatible with the same transformation. In that case, any four points can be used, and the rest will match the transformation exactly. At least in theory. For the sake of numeric stability you'd probably want to choose your points so that they are far from being collinear.
Or your points are not all compatible with a single projective transformation. In this case, all you can hope for is an approximation. If you want the best approximation, you'll have to be more specific about what “best” means, i.e. some kind of error measure. Measuring things in a projective setup is inherently tricky, since there are usually a lot of arbitrary decisions involved.
What you can try is fixing one matrix entry (e.g. the lower right one to 1), then writing the conditions for the remaining 8 coordinates as a system of linear equations, and performing a least squares approximation. But the choice of matrix representative (i.e. fixing one entry here) affects the least squares error measure while it has no effect on the geometric meaning, so this is a pretty arbitrary choice. If the lower right entry of the desired matrix should happen to be zero, you'd computation will run into numeric problems due to overflow.

Sine Table Interpolation

I want to put together a SDR system that tunes initially AM, later FM etc.
The system I am planning to use to do this will have a sine lookup table for Direct Digital Synthesis (DDS).
In order to tune properly I expect to need to be able to precisely control the frequency of the sine wave fed to the Mixer (multiplier in this case). I expect that linear interpolation will be close, but think a non-linear method will provide better results.
What is a good and fast interpolation method to use for sine tables. Multiplication and addition are cheap on the target system; division is costly.
Edit:
I am planning on implementing constants with multiply/shift functions to normalize the constants to scaled integers. Intermediate values will use wide adds, and multiplies will use 18 or 17 bits. Floating point "pre-computation" can be used, but not on the target platform. When I say "division is costly" I mean that it has to implemented using the multipliers and a lot of code. It's not unthinkable, but should be avoided. However, true floating point IEEE methods would take a significant amount of resources on this platform, as well as a custom implementation.
Any SDR experiences would be helpful.

If you don't get very good results with linear interpolation you can try the trigonometric relations.
Sum and Difference Formulas
sin(A+B)=sinA*cosB + cosA*sinB
sin(A-B)=sinA*cosB - cosA*sinB
cos(A+B)=cosA*cosB - sinA*sinB
cos(A-B)=cosA*cosB + sinA*sinB
and you can have precalculated sin and cos values for A, B ranges, ie
A range: 0, 10, 20, ... 90
B range: 0.01 ... 0.99

table interpolation for smooth functions = ick hurl bleah. IMHO I would only use table interpolation on some really weird function, or where you absolutely needed to ensure you avoid discontinuities (note that the derivatives for interpolated tables are discontinuous though). By the time you finish doing table lookups and the required interpolation code, you could have already evaluated a polynomial or two, at least if multiplication doesn't cause you too much heartburn.
IMHO you're much better off using Chebyshev approximation for each segment (e.g. -90 to +90 degrees, or -45 to +45 degrees, and then other segments of the same width) of the sine waveform, and picking the minimum degree polynomial that reduces your error to a desired value. If the segment is small enough you could get away with a quadratic or maybe even a linear polynomial; there's tradeoffs between accuracy, and # of segments, and degree of polynomial.
See my post in this other question, it'll save you the trouble of calculating coefficients (at least if you believe my math).
(edit: in case this wasn't clear, you do the Chebyshev approximation at design-time on your favorite high-powered PC, so that at run-time you can use a dirtbag microcontroller or FPGA or whatever with a simple polynomial of degree 1-4. Don't go over degree 4 unless you know what you're doing, 3 or below would be better.)

Why a table? This very fast function has its worst noise peak at -90db when the signal is at -20db. That's crazy good.
For resampling of audio, I always use one of the interpolators from the Elephant paper. This was discussed in a previous SO question.
If you're on a processor that doesn't have fp, you can still do these things, but they are harder. I've been there. I feel your pain. Good luck! I used to do conversions for fp to integer for fun, but now you'd have to pay me to do it. :-)
Cool online references that apply to your problem:
http://www.audiomulch.com/~rossb/code/sinusoids/
http://www.dattalo.com/technical/theory/sinewave.html
Edit: additional thoughts based on your comments
Since you're working on a tricky processor, maybe you should look into how to make your sine table have more angles to look up, but still keep it small.
Suppose you break a quadrant into 90 pieces (in reality, you'd probably use 256 pieces, but let's keep it 90 for familiarity and clarity). Encode those as 16 bits. That's 180 bytes of table so far.
Now, for every one of those degrees, we're going to have 9 (in reality probably 8 or 16) in-between points.
Let's take the range between 3 degrees and 4 degrees as an example.
sin(3)=0.052335956 //this will be in your table as a 16-bit number
sin(4)=0.069756474 //this will be in your table as a 16-bit number
so we're going to look at sin(3.1)
sin(3.1)=0.054978813 //we're going to be tricky and store the result
// in 8 bits as a percentage of the distance between
// sin(3) and sin(4)
What you want to do is figure out how sin(3.1) fits in between sin(3) and sin(4). If it's half way between, code that as a byte of 128. If it's a quarter of the way between, code that as 64.
That's an additional 90 bytes and you've encoded down to a tenth of a degree in 16-bit res in only 180+90*9 bytes. You can extend as needed (maybe going up to 32-bit angles and 16-bit tween angles) and linearly interpolate in between very quickly. To minimize storage space, you're taking advantage of the fact that consecutive values are close to each other.
Edit 2: better way to encode the in-between angles in a table
I just remembered that when I did this, I ended up very compactly expressing the difference between the expected value according to linear interpolation and the actual value. This error is always in the same direction.
I first calculated the maximum error in the range and then based the scale on that.
Worked great. I feel like I should do the code in a blog entry to illustrate. :-)

Interpolation in a sine table is effectively resampling. Obviously you can get perfect results by a single call to sin, so whatever your solution is it needs to outperform that. For fixed-filter resampling, you're still going to only have a fixed set of available points (a 3:1 upsampler means you'll have 2 new points available between each point in your table). How expensive is memory on the target system? My primary recommendation is simply improve the table resolution and use linear interpolation. You'll get the same results as a smaller table and simple upsample but with less computational overhead.

Have you considered using the Taylor series for the trig functions (found here)? This involves multiplication and division but depending on how your numbers are represented you may be able to turn the division into multiplication (or bit shifts if you're very lucky). You can compute as many terms of the series as you need and get your precision that way.
Alternately if this sine wave is going to be an analog signal at some point then you could just use a lookup table approach and use an analog filter to remove the sampling frequency from the resulting waveform. If your sampling frequency is 100 times the sine frequency it will be easy to remove. You'll need a variable filter to do this. I've never done such a thing but I know there's digital potentiometers that take a binary number and change their resistance. That could be the basis of a variable RC filter - probably with some op-amps for gain, etc.
Good luck!

People have written some amazingly clever code for quickly calculating sin() on systems with tiny amounts of memory that don't even have a hardware multiply instruction, much less a division instruction.
In order of increasing complexity:
Use a square wave. Many AM radios use square waves in their ring demodulator, and I fail to see why your AM demodulator requires anything more complicated.
Approximate sin() by looking up the "closest value" in a raw table of 256 values per quarter-cycle. Yes, you see horrible-looking stair-steps, but (with a little bit of analog filtering) this often works well. (In fact, this is often overkill, and a much shorter table is adequate).
Approximate sin() by looking up the 2 closest values in a raw table, and linearly interpolating between them.
Approximate sin() with 16 short, equally-spaced-in-x cubic splines per quarter-cycle "gives better than 16-bit precision" for sin(x).
Wikibooks: Fixed-Point Numbers links to some clever implementations of the last 3.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart