Solving magic squares in Z3

Solving magic squares in Z3 - z3

I'm new to Z3 and as an exercise attempted a magic square solver by adapting an existing sudoku solver (http://lauri.võsandi.com/tub/qaoes/z3.html). I supply no facts (i.e. no specific numbers in specific boxes) other than the sum of all rows, columns and main diagonals are equal and entries are distinct and in the range [1,N*N]. It works fine for squares up to size 4. Any higher though and I give up waiting for a solution.
Is this normal? Or would experienced z3 programmers suggest my implementation has issues as problems of this size should be solvable?
Thanks.

You might have better results by expressing the entry in each cell as a BitVec() variable rather than an Int() variable.
See https://github.com/0vercl0k/z3-playground/blob/master/magic_square_z3.py for an example implementation. That implementation is able to find a solution for a 5x5 magic square in 1 second, a 6x6 magic square in 13 seconds, and a 7x7 magic square in 24 seconds (on my computer), so it seems to be doing significantly better than your formulation.

Related

Particle Swarm Optimisation: Converges to local optima too quickly in high dimension space

In a portfolio optimisation problem, I have a high dimension (n=500) space with upper and lower bounds of [0 - 5,000,000]. With PSO I am finding that the solution converges quickly to a local optima rather and have narrowed down the problem to a number of areas:
Velocity: Particle velocity rapidly decays to extremely small step sizes [0-10] in the context of the upper/lower bounds [0 - 5,000,000]. One plug I have found is that I could change the velocity update function to a binary step size [e.g. 250,000] by using a sigmoid function but this clearly is only a plug. Any recommendations on how to motivate the velocity to remain high?
Initial Feasible Solutions: When initialising 1,000 particles, I might find that only 5% are feasible solutions in the context of my constraints. I thought that I could improve the search space by re-running the initialisation until all particles start off in a feasible space but it turns out that this actually results in a worse performance and all the particles just stay stuck close to their initialisation vector.
With respect to my paremeters, w1=c1=c2=0.5. Is this likely to be the source of both problems?
I am open to any advice on this as in theory it should be a good approach to portfolio optimisation but in practice i am not seeing this.

Consider changing the parameters. Using w=0.5 'stabilizes' the particle and thus, preventing escape from local optima because it already converges. Furthermore, I would suggest to put the value of c1 and c2 to become larger than 1 (I think 2 is the suggested value), and maybe modify the value for c1 (Tendency to move toward global best) slightly smaller than c2 to prevent overcrowding on one solution.
Anyway, have you tried to do the PSO with a larger amount of particles? People usually use 100-200 particles to solve 2-10 dimensional problem. I don't think 1,000 particles in 500 dimensional space will cut it. I would also suggest to use more advanced initialization method instead of normal or uniform distribution (e.g. chaotic map, Sobol sequence, Latin Hypercube sampling).

Homography and projective transformation

im trying to write a code that will do projective transformation, but with more than 4 key points. i found this helpful guide but it uses 4 points of reference
https://math.stackexchange.com/questions/296794/finding-the-transform-matrix-from-4-projected-points-with-javascript
i know that matlab uses has a function tcp2form that handles that, but i haven't found a way so far.
anyone can give me some guidance, on how to do so? i can solve the equations using (least squares), but i'm stuck since i have a matrix that is larger than 3*3 and i can't multiple the homogeneous coordinates.
Thanks

If you have more than four control points, you have an overdetermined system of equations. There are two possible scenarios. Either your points are all compatible with the same transformation. In that case, any four points can be used, and the rest will match the transformation exactly. At least in theory. For the sake of numeric stability you'd probably want to choose your points so that they are far from being collinear.
Or your points are not all compatible with a single projective transformation. In this case, all you can hope for is an approximation. If you want the best approximation, you'll have to be more specific about what “best” means, i.e. some kind of error measure. Measuring things in a projective setup is inherently tricky, since there are usually a lot of arbitrary decisions involved.
What you can try is fixing one matrix entry (e.g. the lower right one to 1), then writing the conditions for the remaining 8 coordinates as a system of linear equations, and performing a least squares approximation. But the choice of matrix representative (i.e. fixing one entry here) affects the least squares error measure while it has no effect on the geometric meaning, so this is a pretty arbitrary choice. If the lower right entry of the desired matrix should happen to be zero, you'd computation will run into numeric problems due to overflow.

What kind of heuristics for BFS use to solve this 'game' (find path)?

I want to solve a 'game'.
I have 5 circles, we can rotate circles into left or into right (90 degrees).
Example:
Goal: 1,2,3,....,14,15,16
Ex. of starting situations: 16,15,14,...,3,2,1
I'm using BFS to find path but I can't invent heuristic function (my every solutions are not good). I was trying manhattan distance and others... (Maybe idea is good but something wrong with my solution). Please help!

One trick you might try is to do a breadth-first search backward from the goal state. Stop it early. Then you can terminate your (forward from the initial state) search once you've hit a state seen by the backward search.
Sum of Manhattan distances from pieces to their goals is a decent baseline heuristic for the forward A* search. You can do rather better by adding up the number of turns needed to get 1-8 into their places to the number of turns needed to get 9-16 into their places; each of these state spaces is small enough (half a billion states or so) to precompute.

One heuristic that you could use is the cumulative number of turns that it takes to move each individual segment to its designated spot. The individual values would range from zero (the item is in its spot) to five (moving corner to corner). The total for the goal configuration is zero.
One has to be careful using this heuristic, because going from the initial configuration to the desired configuration may require steps when the cumulative number of turns increases after a move.
Finding a solution may require an exhaustive search. You need to memoize or use another DP technique to avoid solving the same position multiple times.

A simple conservative (admissible) heuristic would be:
For each number 1 <= i <= 16, find the minimum number of rotations needed to put i back in its correct position (disregarding all other numbers)
Take the maximum over all these minimums.
This amounts to reporting minimum number of rotations needed to position the "worst" number correctly, and will therefore never overestimate the number of moves needed (since fixing all numbers' positions simultaneously requires at least as many moves as fixing any one of them).
It may, however, underestimate the number of moves needed by a long way. You can get more sophisticated by calculating, for each number 1 <= i <= 16 and for each wheel 1 <= j <= 5, the minimum number of rotations of wheel j needed by any sequence of moves that positions i correctly. For each wheel j, you can then take a separate maximum over all numbers i, and finally add these 5 maxima together, since they are all independent. (This may be less than the previous heuristic, but you are always allowed to take the greater of the two, so this won't be a problem.)

GPUImage Taking sum of columns of image

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?

One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

Sine Table Interpolation

I want to put together a SDR system that tunes initially AM, later FM etc.
The system I am planning to use to do this will have a sine lookup table for Direct Digital Synthesis (DDS).
In order to tune properly I expect to need to be able to precisely control the frequency of the sine wave fed to the Mixer (multiplier in this case). I expect that linear interpolation will be close, but think a non-linear method will provide better results.
What is a good and fast interpolation method to use for sine tables. Multiplication and addition are cheap on the target system; division is costly.
Edit:
I am planning on implementing constants with multiply/shift functions to normalize the constants to scaled integers. Intermediate values will use wide adds, and multiplies will use 18 or 17 bits. Floating point "pre-computation" can be used, but not on the target platform. When I say "division is costly" I mean that it has to implemented using the multipliers and a lot of code. It's not unthinkable, but should be avoided. However, true floating point IEEE methods would take a significant amount of resources on this platform, as well as a custom implementation.
Any SDR experiences would be helpful.

If you don't get very good results with linear interpolation you can try the trigonometric relations.
Sum and Difference Formulas
sin(A+B)=sinA*cosB + cosA*sinB
sin(A-B)=sinA*cosB - cosA*sinB
cos(A+B)=cosA*cosB - sinA*sinB
cos(A-B)=cosA*cosB + sinA*sinB
and you can have precalculated sin and cos values for A, B ranges, ie
A range: 0, 10, 20, ... 90
B range: 0.01 ... 0.99

table interpolation for smooth functions = ick hurl bleah. IMHO I would only use table interpolation on some really weird function, or where you absolutely needed to ensure you avoid discontinuities (note that the derivatives for interpolated tables are discontinuous though). By the time you finish doing table lookups and the required interpolation code, you could have already evaluated a polynomial or two, at least if multiplication doesn't cause you too much heartburn.
IMHO you're much better off using Chebyshev approximation for each segment (e.g. -90 to +90 degrees, or -45 to +45 degrees, and then other segments of the same width) of the sine waveform, and picking the minimum degree polynomial that reduces your error to a desired value. If the segment is small enough you could get away with a quadratic or maybe even a linear polynomial; there's tradeoffs between accuracy, and # of segments, and degree of polynomial.
See my post in this other question, it'll save you the trouble of calculating coefficients (at least if you believe my math).
(edit: in case this wasn't clear, you do the Chebyshev approximation at design-time on your favorite high-powered PC, so that at run-time you can use a dirtbag microcontroller or FPGA or whatever with a simple polynomial of degree 1-4. Don't go over degree 4 unless you know what you're doing, 3 or below would be better.)

Why a table? This very fast function has its worst noise peak at -90db when the signal is at -20db. That's crazy good.
For resampling of audio, I always use one of the interpolators from the Elephant paper. This was discussed in a previous SO question.
If you're on a processor that doesn't have fp, you can still do these things, but they are harder. I've been there. I feel your pain. Good luck! I used to do conversions for fp to integer for fun, but now you'd have to pay me to do it. :-)
Cool online references that apply to your problem:
http://www.audiomulch.com/~rossb/code/sinusoids/
http://www.dattalo.com/technical/theory/sinewave.html
Edit: additional thoughts based on your comments
Since you're working on a tricky processor, maybe you should look into how to make your sine table have more angles to look up, but still keep it small.
Suppose you break a quadrant into 90 pieces (in reality, you'd probably use 256 pieces, but let's keep it 90 for familiarity and clarity). Encode those as 16 bits. That's 180 bytes of table so far.
Now, for every one of those degrees, we're going to have 9 (in reality probably 8 or 16) in-between points.
Let's take the range between 3 degrees and 4 degrees as an example.
sin(3)=0.052335956 //this will be in your table as a 16-bit number
sin(4)=0.069756474 //this will be in your table as a 16-bit number
so we're going to look at sin(3.1)
sin(3.1)=0.054978813 //we're going to be tricky and store the result
// in 8 bits as a percentage of the distance between
// sin(3) and sin(4)
What you want to do is figure out how sin(3.1) fits in between sin(3) and sin(4). If it's half way between, code that as a byte of 128. If it's a quarter of the way between, code that as 64.
That's an additional 90 bytes and you've encoded down to a tenth of a degree in 16-bit res in only 180+90*9 bytes. You can extend as needed (maybe going up to 32-bit angles and 16-bit tween angles) and linearly interpolate in between very quickly. To minimize storage space, you're taking advantage of the fact that consecutive values are close to each other.
Edit 2: better way to encode the in-between angles in a table
I just remembered that when I did this, I ended up very compactly expressing the difference between the expected value according to linear interpolation and the actual value. This error is always in the same direction.
I first calculated the maximum error in the range and then based the scale on that.
Worked great. I feel like I should do the code in a blog entry to illustrate. :-)

Interpolation in a sine table is effectively resampling. Obviously you can get perfect results by a single call to sin, so whatever your solution is it needs to outperform that. For fixed-filter resampling, you're still going to only have a fixed set of available points (a 3:1 upsampler means you'll have 2 new points available between each point in your table). How expensive is memory on the target system? My primary recommendation is simply improve the table resolution and use linear interpolation. You'll get the same results as a smaller table and simple upsample but with less computational overhead.

Have you considered using the Taylor series for the trig functions (found here)? This involves multiplication and division but depending on how your numbers are represented you may be able to turn the division into multiplication (or bit shifts if you're very lucky). You can compute as many terms of the series as you need and get your precision that way.
Alternately if this sine wave is going to be an analog signal at some point then you could just use a lookup table approach and use an analog filter to remove the sampling frequency from the resulting waveform. If your sampling frequency is 100 times the sine frequency it will be easy to remove. You'll need a variable filter to do this. I've never done such a thing but I know there's digital potentiometers that take a binary number and change their resistance. That could be the basis of a variable RC filter - probably with some op-amps for gain, etc.
Good luck!

People have written some amazingly clever code for quickly calculating sin() on systems with tiny amounts of memory that don't even have a hardware multiply instruction, much less a division instruction.
In order of increasing complexity:
Use a square wave. Many AM radios use square waves in their ring demodulator, and I fail to see why your AM demodulator requires anything more complicated.
Approximate sin() by looking up the "closest value" in a raw table of 256 values per quarter-cycle. Yes, you see horrible-looking stair-steps, but (with a little bit of analog filtering) this often works well. (In fact, this is often overkill, and a much shorter table is adequate).
Approximate sin() by looking up the 2 closest values in a raw table, and linearly interpolating between them.
Approximate sin() with 16 short, equally-spaced-in-x cubic splines per quarter-cycle "gives better than 16-bit precision" for sin(x).
Wikibooks: Fixed-Point Numbers links to some clever implementations of the last 3.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart