My team has been using the Z3 solver to perform passive learning. Passive learning entails obtaining from a set of observations a model consistent with all observations in the set. We consider models of different formalisms, the simplest being Deterministic Finite Automata (DFA) and Mealy machines. For DFAs, observations are just positive or negative samples.
The approach is very simplistic. Given the formalism and observations, we encode each observation into a Z3 constraint over (uninterpreted) functions which correspond to functions in the formalism definition. For DFAs for example, this definition includes a transition function (trans: States X Inputs -> States) and an output function (out: States -> Boolean).
Encoding say the observation (aa, +) would be done as follows:
out(trans(trans(start,a),a)) == True
Where start is the initial state. To construct a model, we add all the observation constraints to the solver. We also add a constraint which limits the number of states in the model. We solve the constraints for a limit of 1, 2, 3... states until the solver can find a solution. The solution is a minimum state-model that is consistent with the observations.
I posted a code snippet using Z3Py which does just this. Predictably, our approach is not scalable (the problem is NP-complete). I was wondering if there were any (small) tweaks we could perform to improve scalability? (in the way of trying out different sorts, strategies...)
We have already tried arranging all observations into a Prefix Tree and using this tree in encoding, but scalability was only marginally improved. I am well aware that there are much more scalable SAT-based approaches to this problem (reducing it to a graph coloring problem). We would like to see how far a simple SMT-based approach can take us.
So far, what I have found is that the best Sorts for defining inputs and states are DeclareSort. It also helps if we eliminate quantifiers from the state-size constraint. Interestingly enough, incremental solving did not really help. But it could be that I am not using it properly (I am an utter novice in SMT theory).
Thanks! BTW, I am unsure how viable/useful this test is as a benchmark for SMT solvers.
Related
While reading Extending Sledgehammer with SMT solvers I read the following:
In the original Sledgehammer architecture, the available lemmas were rewritten
to clause normal form using a naive application of distributive laws before the relevance filter was invoked. To avoid clausifying thousands of lemmas on each invocation, the clauses were kept in a cache. This design was technically incompatible with the (cache-unaware) smt method, and it was already unsatisfactory for ATPs, which include custom polynomial-time clausifiers.
My understanding of SMT so far is as follows: SMTs don't work over clauses. Instead, they try to build a model for the quantifier-free part of a problem. The search is refined by instantiating quantifiers according to some set of active terms. Thus, indeed no clausal form is needed for SMT solvers.
We rewrote the relevance filter so that it operates on arbitrary HOL formulas, trying to simulate the old behavior. To mimic the penalty associated with Skolem functions in the clause-based code, we keep track of polarities and detect quantifiers that give rise to Skolem functions.
What's the penalty associated with Skolem functions? I could understand they are not good for SMTs, but here it seems that they are bad for ATPs too...
First, SMT solvers do work over clauses and there is definitely some (non-naive) normalization internally (e.g., miniscoping). But you do not need to do the normalization before calling the SMT solver (especially, since it will be more naive and generate a larger number of clauses).
Anyway, Section 6.6.7 explains why skolemization was done on the Isabelle side. To summarize: it is not possible to introduce polymorphic constants in a proof in Isabelle; hence it must be done before starting the proof.
It seems likely that, when writing the paper, not changing the filtering lead to worse performance and, hence, the penalty was added. However, I tried to find the relevant code simulating clausification in Sledgehammer, so I don't believe that this happens anymore.
Problem
I'm trying to use z3 to disprove reachability assertions on a Petri net.
So I declare N state variables v0,..v_n-1 which are positive integers, one for each place of a Petri net.
My main strategy given an atomic proposition P on states is the following :
compute (with an exterior engine) any "easy" positive invariants as linear constraints on the variables, of the form alpha_0 * v_0 + ... = constant with only positive or zero alpha_i, then check_sat if any state reachable under these constraints satisfies P, if unsat conclude, else
compute (externally to z3) generalized invariants, where the alpha_i can be negative as well and check_sat, conclude if unsat, else
add one positive variable t_i per transition of the system, and assert the Petri net state equation, that any reachable state has a Parikh firing count vector (a value of t_i's) such that M0 the initial state + product of this Parikh vector by incidence matrix gives the reached state. So this one introduces many new variables, and involves some multiplication of variables, but stays a linear integer programming problem.
I separate the steps because since I want UNSAT, any check_sat that returns UNSAT stops the procedure, and the last step in particular is very costly.
I have issues with larger models, where I get prohibitively long answer times or even the dreaded "unknown" answer, particularly when adding state equation (step 3).
Background
So besides splitting the problem into incrementally harder segments I've tried setting logic to QF_LRA rather than QF_LIA, and declaring the variables as Real than integers.
This overapproximation is computationally friendly (z3 is fast on these !) but unfortunately for many models the solutions are not integers, nor is there an integer solution.
So I've tried setting Reals, but specifying that each variable is either =0 or >=1, to remove solutions with fractions of firings < 1. This does eliminate spurious solutions, but it "kills" z3 (timeout or unknown) in many cases, the problem is obviously much harder (e.g. harder than with just integers).
Examples
I don't have a small example to show, though I can produce some easily. The problem is if I go for QF_LIA it gets prohibitively slow at some number of variables. As a metric, there are many more transitions than places, so adding the state equation really ups the variable count.
This code is generating the examples I'm asking about.
This general presentation slides 5 and 6 express the problem I'm encoding precisely, and slides 7 and 8 develop the results of what "unsat" gives us, if you want more mathematical background.
I'm generating problems from the Model Checking Contest, with up to thousands of places (primary variables) and in some cases above a hundred thousand transitions. These are extremum, the middle range is a few thousand places, and maybe 20 thousand transitions that I would really like to deal with.
Reals + the greater than 1 constraint is not a good solution even for some smaller problems. Integers are slow from the get-go.
I could try Reals then iterate into Integers if I get a non integral solution, I have not tried that, though it involves pretty much killing and restarting the solver it might be a decent approach on my benchmark set.
What I'm looking for
I'm looking for some settings for Z3 that can better help it deal with the problems I'm feeding it, give it some insight.
I have some a priori idea about what could solve these problems, traditionally they've been fed to ILP solvers. So I'm hoping to trigger a simplex of some sort, but maybe there are conditions preventing z3 from using the "good" solution strategy in some cases.
I've become a decent level SMT/Z3 user, but I've never played with the fine settings of :options, to guide the solver.
Have any of you tried feeding what are basically ILP problems to SMT, and found options settings or particular encodings that help it deploy the right solutions ? thanks.
I am confused about how linear regression works in supervised learning. Now I want to generate a evaluation function for a board game using linear regression, so I need both the input data and output data. Input data is my board condition, and I need the corresponding value for this condition, right? But how can I get this expected value? Do I need to write an evaluation function first by myself? But I thought I need to generate an evluation function by using linear regression, so I'm a little confused about this.
It's supervised-learning after all, meaning: you will need input and output.
Now the question is: how to obtain these? And this is not trivial!
Candidates are:
historical-data (e.g. online-play history)
some form or self-play / reinforcement-learning (more complex)
But then a new problem arises: which output is available and what kind of input will you use.
If there would be some a-priori implemented AI, you could just take the scores of this one. But with historical-data for example you only got -1,0,1 (A wins, draw, B wins) which makes learning harder (and this touches the Credit Assignment problem: there might be one play which made someone lose; it's hard to understand which of 30 moves lead to the result of 1). This is also related to the input. Take chess for example and take a random position from some online game: there is the possibility that this position is unique over 10 million games (or at least not happening often) which conflicts with the expected performance of your approach. I assumed here, that the input is the full board-position. This changes for other inputs, e.g. chess-material, where the input is just a histogram of pieces (3 of these, 2 of these). Now there are much less unique inputs and learning will be easier.
Long story short: it's a complex task with a lot of different approaches and most of this is somewhat bound by your exact task! A linear evaluation-function is not super-uncommon in reinforcement-learning approaches. You might want to read some literature on these (this function is a core-component: e.g. table-lookup vs. linear-regression vs. neural-network to approximate the value- or policy-function).
I might add, that your task indicates the self-learning approach to AI, which is very hard and it's a topic which somewhat gained additional (there was success before: see Backgammon AI) popularity in the last years. But all of these approaches are highly complex and a good understanding of RL and the mathematical-basics like Markov-Decision-Processes are important then.
For more classic hand-made evaluation-function based AIs, a lot of people used an additional regressor for tuning / weighting already implemented components. Some overview at chessprogramming wiki. (the chess-material example from above might be a good one: assumption is: more pieces better than less; but it's hard to give them values)
I am implementing a SARSA(lambda) model in C++ to overcome some of the limitations (the sheer amount of time and space DP models require) of DP models, which hopefully will reduce the computation time (takes quite a few hours atm for similar research) and less space will allow adding more complexion to the model.
We do have explicit transition probabilities, and they do make a difference. So how should we incorporate them in a SARSA model?
Simply select the next state according to the probabilities themselves? Apparently SARSA models don't exactly expect you to use probabilities - or perhaps I've been reading the wrong books.
PS- Is there a way of knowing if the algorithm is properly implemented? First time working with SARSA.
The fundamental difference between Dynamic Programming (DP) and Reinforcement Learning (RL) is that the first assumes that environment's dynamics is known (i.e., a model), while the latter can learn directly from data obtained from the process, in the form of a set of samples, a set of process trajectories, or a single trajectory. Because of this feature, RL methods are useful when a model is difficult or costly to construct. However, it should be notice that both approaches share the same working principles (called Generalized Policy Iteration in Sutton's book).
Given they are similar, both approaches also share some limitations, namely, the curse of dimensionality. From Busoniu's book (chapter 3 is free and probably useful for your purposes):
A central challenge in the DP and RL fields is that, in their original
form (i.e., tabular form), DP and RL algorithms cannot be implemented
for general problems. They can only be implemented when the state and
action spaces consist of a finite number of discrete elements, because
(among other reasons) they require the exact representation of value
functions or policies, which is generally impossible for state spaces
with an infinite number of elements (or too costly when the number of
states is very high).
Even when the states and actions take finitely many values, the cost
of representing value functions and policies grows exponentially with
the number of state variables (and action variables, for Q-functions).
This problem is called the curse of dimensionality, and makes the
classical DP and RL algorithms impractical when there are many state
and action variables. To cope with these problems, versions of the
classical algorithms that approximately represent value functions
and/or policies must be used. Since most problems of practical
interest have large or continuous state and action spaces,
approximation is essential in DP and RL.
In your case, it seems quite clear that you should employ some kind of function approximation. However, given that you know the transition probability matrix, you can choose a method based on DP or RL. In the case of RL, transitions are simply used to compute the next state given an action.
Whether is better to use DP or RL? Actually I don't know the answer, and the optimal method likely depends on your specific problem. Intuitively, sampling a set of states in a planned way (DP) seems more safe, but maybe a big part of your state space is irrelevant to find an optimal pocliy. In such a case, sampling a set of trajectories (RL) maybe is more effective computationally. In any case, if both methods are rightly applied, should achive a similar solution.
NOTE: when employing function approximation, the convergence properties are more fragile and it is not rare to diverge during the iteration process, especially when the approximator is non linear (such as an artificial neural network) combined with RL.
If you have access to the transition probabilities, I would suggest not to use methods based on a Q-value. This will require additional sampling in order to extract information that you already have.
It may not always be the case, but without additional information I would say that modified policy iteration is a more appropriate method for your problem.
I am in a epic debate with a colleague who claims that reducing the number of hiddens is the best way to deal with over training.
While it can be demonstrated that generalization error decreases with training of such a net, ultimately it will not reach the level that more hiddens and early stopping can achieve.
I believe our project has many types of ill-"conditioning" of which nonstationarity is just one. I believe large numbers of hiddens are required to handle these issues which could be likened to classes of inputs.
While this seems intuitive to me, I can't make a convincing argument.
One of the most basic arguments is, that method should have a strong theoretical justification and useful implication. In particular, while number of hidden units can be use to reduce overfitting its main drawbacks are:
hard theoretical analysis - you can't really tell what difference makes adding two more neurons, while you can exactly say what changes when you change regularization strength
finite set of possible states - you can only have integer values of hidden units, leading to finite family of models you are considering; while using regularization (even simple L2 reg) gives you continuum of possible models due to the use of real regularization parameter