Does anybody know if Z3 supports SSMT (i.e. randomized quantifiers) or if there is any plans to add it?
Reference paper
There are no current plans to directly handle randomized quantifiers.
The authors of that paper may have more information. The paper points to a system: https://projects.avacs.org/projects/sisat/wiki/SiSAT_Manual
However, it has been inactive the past 3 years, so I am not sure if this
is something the authors are urgently pursuing. Maybe there are follow on systems.
The closest I am considering is alternating min/max objectives. That is, you should be able to compute some objective function f(x,y), such that x, y
satisfy constraints Phi(x,y) and subject to min x max y f(x,y).
The latter means to find x, such that f(x,y) is minimized, over all y.
There is an ongoing (but currently low activity) effort to implement SSMT (and similar) on top (not inside) of Z3, which outperforms Tino Teige's methods and Prism on some particular problems, see here and here.
Related
I have recently been reading the Wavenet and PixelCNN papers, and in both of them they mention that using gated activation functions work better than a ReLU. But in neither cases they offer an explanation as to why that is.
I have asked on other platforms (like on r/machinelearning) but I have not gotten any replies so far. Might it be that they just tried (by chance) this replacement and it turned out to yield favorable results?
Function for reference:
y = tanh(Wk,f ∗ x) . σ(Wk,g ∗ x)
Element-wise multiplication between the sigmoid and tanh of the convolution.
I did some digging and talked some more with a friend, who pointed me towards a paper by Dauphin et. al. about "Language Modeling with Gated Convolutional Networks". He offers a good explanation on this topic in section 3 of the paper:
LSTMs enable long-term memory via a separate cell controlled by input and forget gates. This allows information to flow unimpeded through potentially many timesteps. Without these gates, information could easily vanish through the transformations of each timestep.
In contrast, convolutional networks do not suffer from the same kind of vanishing gradient and we find experimentally that they do not require forget gates. Therefore, we consider models possessing solely output gates, which allow the network to control what information should be propagated through the hierarchy of layers.
In other terms, that means, that they adopted the concept of gates and applied them to sequential convolutional layers, to control what type of information is being let through, and apparently this works better than using a ReLU.
edit: But WHY it works better, I still don't know, if anyone could give me an even remotely intuitive answer I would be grateful, I looked around a bit more, and apparently we are still basing our judgement on trial and error.
I believe it's because it's highly non-linear near zero, unlike relu. With their new activation function [tanh(W1 * x) * sigmoid(W2 * x)] you get a function that has some interesting bends in the [-1,1] range.
Don't forget that this isn't operating on the feature space, but on a matrix multiplication of the feature space, so it's not just "bigger feature values do this, smaller feature values do that" but rather it operates on the outputs of a linear transform of the feature space.
Basically it chooses regions to highlight, regions to ignore, and does so flexibly (and non-linearly) thanks to the activation.
https://www.desmos.com/calculator/owmzbnonlh , see "c" function.
This allows the model to separate the data in the gated attention space.
That's my understanding of it but it is still pretty alchemical to me as well.
Problem
I'm trying to use z3 to disprove reachability assertions on a Petri net.
So I declare N state variables v0,..v_n-1 which are positive integers, one for each place of a Petri net.
My main strategy given an atomic proposition P on states is the following :
compute (with an exterior engine) any "easy" positive invariants as linear constraints on the variables, of the form alpha_0 * v_0 + ... = constant with only positive or zero alpha_i, then check_sat if any state reachable under these constraints satisfies P, if unsat conclude, else
compute (externally to z3) generalized invariants, where the alpha_i can be negative as well and check_sat, conclude if unsat, else
add one positive variable t_i per transition of the system, and assert the Petri net state equation, that any reachable state has a Parikh firing count vector (a value of t_i's) such that M0 the initial state + product of this Parikh vector by incidence matrix gives the reached state. So this one introduces many new variables, and involves some multiplication of variables, but stays a linear integer programming problem.
I separate the steps because since I want UNSAT, any check_sat that returns UNSAT stops the procedure, and the last step in particular is very costly.
I have issues with larger models, where I get prohibitively long answer times or even the dreaded "unknown" answer, particularly when adding state equation (step 3).
Background
So besides splitting the problem into incrementally harder segments I've tried setting logic to QF_LRA rather than QF_LIA, and declaring the variables as Real than integers.
This overapproximation is computationally friendly (z3 is fast on these !) but unfortunately for many models the solutions are not integers, nor is there an integer solution.
So I've tried setting Reals, but specifying that each variable is either =0 or >=1, to remove solutions with fractions of firings < 1. This does eliminate spurious solutions, but it "kills" z3 (timeout or unknown) in many cases, the problem is obviously much harder (e.g. harder than with just integers).
Examples
I don't have a small example to show, though I can produce some easily. The problem is if I go for QF_LIA it gets prohibitively slow at some number of variables. As a metric, there are many more transitions than places, so adding the state equation really ups the variable count.
This code is generating the examples I'm asking about.
This general presentation slides 5 and 6 express the problem I'm encoding precisely, and slides 7 and 8 develop the results of what "unsat" gives us, if you want more mathematical background.
I'm generating problems from the Model Checking Contest, with up to thousands of places (primary variables) and in some cases above a hundred thousand transitions. These are extremum, the middle range is a few thousand places, and maybe 20 thousand transitions that I would really like to deal with.
Reals + the greater than 1 constraint is not a good solution even for some smaller problems. Integers are slow from the get-go.
I could try Reals then iterate into Integers if I get a non integral solution, I have not tried that, though it involves pretty much killing and restarting the solver it might be a decent approach on my benchmark set.
What I'm looking for
I'm looking for some settings for Z3 that can better help it deal with the problems I'm feeding it, give it some insight.
I have some a priori idea about what could solve these problems, traditionally they've been fed to ILP solvers. So I'm hoping to trigger a simplex of some sort, but maybe there are conditions preventing z3 from using the "good" solution strategy in some cases.
I've become a decent level SMT/Z3 user, but I've never played with the fine settings of :options, to guide the solver.
Have any of you tried feeding what are basically ILP problems to SMT, and found options settings or particular encodings that help it deploy the right solutions ? thanks.
I want to find the n-dimensional point (x1...xn) in integer space that satisfies some properties, while also maximizing the minimum distance between x and any element of a collection of m (pre-defined/constant) n-dimensional points (z11...z1n, z21...z2n... zm1...zmn). Is there a way to do this using Z3?
Sure. See: https://rise4fun.com/Z3/tutorial/optimization
The above link talks about the SMTLib interface, but the same is also available from the Python interface as well. (And from most other bindings to Z3.)
Note that optimization is largely for linear properties. If you have non-linear terms, you might want to formulate them so that a linear-counter-part can be optimized instead. Even with non-linear terms, you might get good results, impossible to know without trying.
I want to make a genetic algorithm that solves a shortest path problem in weighted, connected graph. Similar to travelling salesman, but instead of fully-connected graph, it's just connected.
My idea is to randomly generate a path consisting of n-1 nodes for each chromosome in binary form, where numbers indicate nodes in a path. Then I will choose the best depending on sum of weights (if cant go from A to B i would give it penalty) and crossover/mutate bits in it. Will it work? It feels a little like smaller version of bruteforce. Is there a better way?
Thanks!
Genetic algorithm is pretty much "smaller version of bruteforce". It is just a metaheuristic, not an optimization method which has decent convergence guarantees. It basically depends on randomness to provide new solutions, thus it is a "slightly better random search".
So "will it work"? Yes, it will do something, as long as you have enough randomness in mutation it will even (eventually) converge to optimum. Will it work better than a random search? Hard to say, this depends on dozens of factors, not only your encoding, but also all the hyperparameters used etc. in general genetic algorithms are about trials and errors. In particular representation of chromosomes which does not loose any information (yours does not) does not matter, meaning that everything depends on clever implementation of crossover and mutation (as long as chromosomes do not loose any information they are all equivalent).
Edited.
You can use permutation coding GA. In permutation coding, you should give the start and end points. GA searches for the best chromosome with your fitness function. Candidate solutions (chromosomes) will be like 2-5-4-3-1 or 2-3-1-4-5 or 1-2-5-4-3 etc. So your solution depends on your fitness function. (Look at GA package for R to apply permutation GA easily.)
Connections are constraints for your problem. My best advice is create a constraint matrix like that:
FirstPoint SecondPoint Connected
A B true
A C true
A E false
... ... ...
In standard TSP, just distances are considered. In your fitness function, you have to consider this matrix and add a penalty to return value for each false.
Example chromosome: A-B-E-D-C
A-B: 1
B-E: 1
E-D: 4
D-C: 3
Fitness value: 9
.
Example chromosome: A-E-B-C-D
A-E: penalty
E-B: 1
B-C: 6
C-D: 3
Fitness value: 10 + penalty value.
Because your constraint is a hard constraint, you can use max integer value as the penalty. GA will find the best solution. :)
I have a set of X-Y values (i.e. a scatter plot) and I want a Pascal routine to generate the coefficients of a Nth order polynomial that fits those points, in the same way that Excel does.
I used David J Taylor's Polyfit example (curvefit.zip), which implements a least squares curve fitting algorithm (also known as linear regression) David's site is here, but keep reading, because my version is better. (See below).
The origin of the algorithms David is using is a book on scientific math for Pascal programmers, Allen Miller's Curve Fitting routine from the book "Pascal Programs For Scientists And Engineers", typed and submitted to MTPUG in Oct. 1982 by Juergen Loewner,
and corrected and adaptated for Turbo Pascal by Jeff Weiss.
You can grab curvefit.zip directly from bitbucket here. (You can clone the sourcecode with Mercurial/TortoiseHG, or download a ZIP from bitbucket)
hg clone https://bitbucket.org/wpostma/curvefit curvefit
It runs in any delphi version 5 and up, Unicode or not, even Delphi 10 Berlin. It has a little chart in the demo, added by me. I also added a way to force the result through the origin, a common technique where you want a best fit on all values, other than the constant term, which should be forced, either to zero, or to some experimentally derived average. A forced "blank subtraction" which is set equal to the average of a series of analytical "zero samples", is common in certain types of analytical chemistry when used with certain types of instrumentation, and in other scientific cases, where it can be more useful than a best-fit, because you may wish to minimize error around the origin more than minimize error across the area of the curve that is farthest from the origin.
I should also clarify that for purposes of linear regression, a "curve" may also be a line, which is the case I needed for analytical chemistry purposes, and that equation for any straight line (y=mx+b) is also called the "calibration curve". A first order curve fit is a line (y = mx +b), a second order curve fit (shown in the picture) is a parabola (y= nX^2 + mX + b). As you might guess, this algorithm scales from first order up to any level you might wish. I haven't tested it above 8 terms though.
Here's a screenshot:
Bitbucket project link:
https://bitbucket.org/wpostma/curvefit/overview
Try TPMath http://tpmath.sourceforge.net/ - I've been using this for years for fitting a hill regression and can recommend it.
Check the functions in Turbo Power's SysTools library, now is open source, it includes math functions in the unit StStat.
Even though you've already awarded an answer, for completeness, I thought I'd add this:
We use SDL Components' Math pack and have been very happy with it.
http://www.lohninger.com/delfcomp.html
It's well thought out, and does exactly what we need.
He's got a variety of other interesting tools on his site.
XlXtrFun is the best curve fitting I know and use, but it is for Excel:
http://www.xlxtrfun.com/XlXtrFun/XlXtrFun.htm