I'm trying to use a finite state machine diagram to represent the Stack abstract data type, and I'm struggling to find a way to represent an unbounded alphabet. A stack can have an infinite number elements, but I can't draw infinite states in my diagram.
The solution I'm leaning toward is to use recursion, but I can't find any examples of expressing recursion in a finite state machine diagram. Is there a standard way to draw a recursion? Or is there another solution to my infinity problem?
The machine model that corresponds to a stack is the push-down automaton. This model is known to be more powerful than finite state machines. Thus it is impossible to represent a stack with the latter. There is no solution to your problem.
You do not have an unbounded alphabet, though. Only the number of symbols on the stack is unbounded.
Related
I'm currently studying compiler's and am on the topic of "Chomsky Hierarchy and the 4 languages." But it beats me as to what the practical purpose of all this is?
It'd be great if I could see real-life examples of the 4 grammars: Unrestricted, CSG, CFG, Regular Grammer come to play.
I found online that Chomsky hierarchy along with the 4 grammars is used to evaluate proposals within cognitive science but this goes way over my head. It'd be great if someone could break it down for me, thanks a lot!
There is no practical value. That's the whole point.
Let me try to break that down a bit. It's useful to remember that Chomsky is a linguist --someone who studies human languages-- and that he was writing in the late 1950s when computational theory was not as well-developed as it is today. (To put it mildly.) His goal was to find a mathematical model which could provide some insights into the mechanisms by which human beings generate and understand sentences, and he took as his starting point a particular simple model of sentence generation.
In this model, a grammar is a function F, which transforms elements from an arbitrary sequence of symbols from some alphabet onto another sequence of symbols from the same alphabet. F is defined by a finite set of pairs (called productions) α → β. We then say that F(ω) = F(ζ) if the definition of F contains some pair α → β such that α is a substring of ω and ζ is the result of substituting a single instance of α in ω with β.
That's not very interesting in and of itself; we make that into a full language by starting with some designated starting sequence, normally represented as the single symbol S, and repeatedly apply F as many times as is necessary. (In all interesting grammars, the set so constructed is infinite, so it cannot actually be constructed. But we can imagine proceeding from the starting point until we find the sentence we wanted to generate.)
The problem with this model is that it can be used to describe an arbitrary Turing Machine. Or, if you like, an arbitrary computer program, although the equivalence is easier to see with a Turing Machine. In other words, it is at least theoretically possible to construct a finite grammar which will recognise strings consisting of the description of a Turing Machine (i.e., a program written in some programming language) followed by an input and an output only if the Turing Machine applied to the input would produce the output. In other words, there exists (in the mathematical sense) a grammar of this form which is computationally equivalent to a general purpose computer.
Unfortunately, that's not actually very useful if our goal is to understand sentences, because there is actually no algorithm for running computers backwards. The best we can do as a general solution is to enumerate all possible inputs and run the program on each of them until we find the output we hoped for. And that doesn't actually work because there is no limit to the amount of time the program might take to produce an output and no way to even know if the program will eventually come to an end. (This is called the "halting problem".) So we might get stuck on some possible input, and we'll never know if some other input might have produced the desired output.
As a result, we cannot tell whether the provided input was "grammatical", that is, whether it conformed to the grammar provided. And that's not just the case with the particular grammar we built to emulate Turing Machines. It means that we have no confidence that we can recognise sentences from any arbitrary grammar, and even if we stumble upon an answer, we have no way to limit how much time it might take to get there.
Clearly, this is not how human beings understand each other. So if it is to serve a practical purpose, we must restrict the possible grammars in some way to make them computationally feasible.
On the other end of the spectrum, a lot was known about finite-state machines. A finite-state machine is a Turing Machine without a tape; that is to say, it is simply a finite collection of states. In each state, the machine reads a single input symbol and uses it to decide what the next state will be. It turns out that finite-state machines can be modelled using a grammar (as above) restricted to very simple productions, each of which is either of the form A → a B or A → a, where a is a symbol from the "terminal alphabet" (that is, a word) and A and B are single grammatical symbols. These grammars are called "regular grammars" and they are computationally equivalent to what mathematicians call "regular expressions" (which are a small subset of what is recognised by "regex" libraries, but that's a whole other discussion).
Regular grammars are easy to parse. All that is needed to is trace through the state machine, so it can be done without backtracking in time proportional to the length of the input. But regular grammars are far too weak to be able to represent human language, or even most computer languages. As a simple example, algebraic expressions with parentheses cannot be recognised with a regular grammar (or with a finite-state machine) because there is no way to count the parenthesis depth; the finite-state machine has no memory at all (other than knowing which state it is in, and there are only a finite number of states).
So unrestricted grammars are too powerful to parse and regular grammars are too weak to be useful. (Useful for complex parsing problems, that is. There are certain applications for regular expressions, but parsing complete computer programs is not one of them.)
The next step, then, was to try to find a restriction on grammars which was still powerful enough to represent human language without being so powerful that parsing became impossible.
That, finally, is the origin of Chomsky's hierarchy. Between the two extremes described above (type 0 and type 3 grammars), Chomsky proposed two possible intermediate restrictions -- type 1 and type 2 grammars -- and proved a number of important properties about each of them.
While this work turned out to be fundamental in the development of formal language theory, it cannot really be said to have answered the question Chomsky started with. Type 2 grammars -- context-free grammars -- are indeed computationally tractable; they can be parsed with simple algorithms in polynomial time, and can represent a large number of useful languages. But they are still too weak to represent human language. In particular, context-free grammars cannot represent a language as simple as "all strings which contain two instances of the same substring". Type 1 grammars -- context-sensitive grammars -- can probably represent any useful language, and are not quite as unruly as unrestricted grammars, but they are still too powerful to parse. (Since the derivation steps in a context-sensitive grammar never get shorter, it is possible to enumerate all possible derivations from a starting point in order by length, which means that you can decide whether a sentence is generated by the grammar without running into the halting problem. But that's as good as it gets; that procedure takes exponential time and is not remotely feasible for non-trivial inputs.)
In the six decades since Chomsky published his seminal papers, a lot of work has been done to try to find useful intermediate restrictions between type 1 and type 2 grammars. And there has been a lot of useful study into algorithms for parsing context-free languages, which is of enormous utility in building compilers. All of this builds on the crucial work done by Chomsky and the other computational theorists whose work he built on -- Markov, Turing, Church and Kleene, just to name a few worthy of study. But Chomsky's original project remains unsolved.
So if your goal is to build a simple parser for a programming language, the Chomsky hierarchy is probably just an interesting footnote. But if you are interested in the academic study of formal language theory, there are still lots of interesting unsolved problems to work on.
Problem
I'm trying to use z3 to disprove reachability assertions on a Petri net.
So I declare N state variables v0,..v_n-1 which are positive integers, one for each place of a Petri net.
My main strategy given an atomic proposition P on states is the following :
compute (with an exterior engine) any "easy" positive invariants as linear constraints on the variables, of the form alpha_0 * v_0 + ... = constant with only positive or zero alpha_i, then check_sat if any state reachable under these constraints satisfies P, if unsat conclude, else
compute (externally to z3) generalized invariants, where the alpha_i can be negative as well and check_sat, conclude if unsat, else
add one positive variable t_i per transition of the system, and assert the Petri net state equation, that any reachable state has a Parikh firing count vector (a value of t_i's) such that M0 the initial state + product of this Parikh vector by incidence matrix gives the reached state. So this one introduces many new variables, and involves some multiplication of variables, but stays a linear integer programming problem.
I separate the steps because since I want UNSAT, any check_sat that returns UNSAT stops the procedure, and the last step in particular is very costly.
I have issues with larger models, where I get prohibitively long answer times or even the dreaded "unknown" answer, particularly when adding state equation (step 3).
Background
So besides splitting the problem into incrementally harder segments I've tried setting logic to QF_LRA rather than QF_LIA, and declaring the variables as Real than integers.
This overapproximation is computationally friendly (z3 is fast on these !) but unfortunately for many models the solutions are not integers, nor is there an integer solution.
So I've tried setting Reals, but specifying that each variable is either =0 or >=1, to remove solutions with fractions of firings < 1. This does eliminate spurious solutions, but it "kills" z3 (timeout or unknown) in many cases, the problem is obviously much harder (e.g. harder than with just integers).
Examples
I don't have a small example to show, though I can produce some easily. The problem is if I go for QF_LIA it gets prohibitively slow at some number of variables. As a metric, there are many more transitions than places, so adding the state equation really ups the variable count.
This code is generating the examples I'm asking about.
This general presentation slides 5 and 6 express the problem I'm encoding precisely, and slides 7 and 8 develop the results of what "unsat" gives us, if you want more mathematical background.
I'm generating problems from the Model Checking Contest, with up to thousands of places (primary variables) and in some cases above a hundred thousand transitions. These are extremum, the middle range is a few thousand places, and maybe 20 thousand transitions that I would really like to deal with.
Reals + the greater than 1 constraint is not a good solution even for some smaller problems. Integers are slow from the get-go.
I could try Reals then iterate into Integers if I get a non integral solution, I have not tried that, though it involves pretty much killing and restarting the solver it might be a decent approach on my benchmark set.
What I'm looking for
I'm looking for some settings for Z3 that can better help it deal with the problems I'm feeding it, give it some insight.
I have some a priori idea about what could solve these problems, traditionally they've been fed to ILP solvers. So I'm hoping to trigger a simplex of some sort, but maybe there are conditions preventing z3 from using the "good" solution strategy in some cases.
I've become a decent level SMT/Z3 user, but I've never played with the fine settings of :options, to guide the solver.
Have any of you tried feeding what are basically ILP problems to SMT, and found options settings or particular encodings that help it deploy the right solutions ? thanks.
My team has been using the Z3 solver to perform passive learning. Passive learning entails obtaining from a set of observations a model consistent with all observations in the set. We consider models of different formalisms, the simplest being Deterministic Finite Automata (DFA) and Mealy machines. For DFAs, observations are just positive or negative samples.
The approach is very simplistic. Given the formalism and observations, we encode each observation into a Z3 constraint over (uninterpreted) functions which correspond to functions in the formalism definition. For DFAs for example, this definition includes a transition function (trans: States X Inputs -> States) and an output function (out: States -> Boolean).
Encoding say the observation (aa, +) would be done as follows:
out(trans(trans(start,a),a)) == True
Where start is the initial state. To construct a model, we add all the observation constraints to the solver. We also add a constraint which limits the number of states in the model. We solve the constraints for a limit of 1, 2, 3... states until the solver can find a solution. The solution is a minimum state-model that is consistent with the observations.
I posted a code snippet using Z3Py which does just this. Predictably, our approach is not scalable (the problem is NP-complete). I was wondering if there were any (small) tweaks we could perform to improve scalability? (in the way of trying out different sorts, strategies...)
We have already tried arranging all observations into a Prefix Tree and using this tree in encoding, but scalability was only marginally improved. I am well aware that there are much more scalable SAT-based approaches to this problem (reducing it to a graph coloring problem). We would like to see how far a simple SMT-based approach can take us.
So far, what I have found is that the best Sorts for defining inputs and states are DeclareSort. It also helps if we eliminate quantifiers from the state-size constraint. Interestingly enough, incremental solving did not really help. But it could be that I am not using it properly (I am an utter novice in SMT theory).
Thanks! BTW, I am unsure how viable/useful this test is as a benchmark for SMT solvers.
F# is often promoted as a functional language where data is immutable by default, however the elements of the matrix and vector types in the F# Powerpack are mutable. Why is this?
Furthermore, for which reason are sparse matrices implemented as immutable as opposed to normal matrices?
The standard array type ('T[]) in F# is also mutable. You're mostly correct -- F# is a functional language where data immutability is encouraged, but not required. Basically, F# allows you to do write both mutable/imperative code and immutable/functional code; it's up to you to decide the best way to implement the code for your specific application.
Another reason for having mutable arrays and matrices is performance -- it is possible to implement very fast algorithms with immutable types, but users writing scientific computations usually only care about one thing: achieving maximum performance. That being that case, it follows that the arrays and matrices should be mutable.
For truly high performance, mutability is required, in one specific case : Provided that your code is perfectly optimized and that you master everything it is doing down to the cache (L1, L2) pattern of access of your program, then nothing beats a low level, to the metal approach.
This happens mostly only when you have one well specified problem that stays constant for 20 years, aka mostly in scientific tasks.
As soon as you depart from this specific case, in 99.99% the bottlenecks arise from having a too low level representation (induced by a low level langage) in which you can't express the final, real-world optimization trade-off of your problem at hand.
Bottom line, for performance, the following approach is the only way (i think) :
High level / algorithmic optimization first
Once every high level ways has been explored, low level optimization
You can see how as a consequence of that :
You should never optimize anything without FIRST measuring the impact : improvements should only be made if they yield enormous performance gains and/or do not degrade your domain logic.
You eventually will reach, if your problem is stable and well defined, the point where you will have no choice but to go to the low level, and play with memory/mutability
In general, and more specifically for Bernoulli mixture model (aka Latent Class Analysis).
EM is pretty much the same as Lloyds variant of k-means. So each iteration takes O(n*k) distance computations. They are more complex distance computations, but in the end they are some variant of Mahalanobis distance.
However, k-means has a quite nice termination behavior. There are only k^n possible cluster assignments. Now if in each step, you choose one that is better, it will have to terminate the latest after trying out all k^n. But in reality, it usually terminates after at most a few dozen steps.
For EM, this is not as easy. Objects are not assigned to a single cluster, but as in fuzzy-c-means are assigned relatively to all clusters. And that's when you lose this termination guarantee.
So without any stopping threshold, EM would infinitely optimize the cluster assignments, up to an infinite precision (assuming you would implement it with infinite precision).
As such, the theoretical runtime of EM is: infinite.
Any threshold (and if it's just hardware floating point precision) will make it finish earlier. But it will be hard to get a theoretical limit here different than O(n*k*i) where i is the number of iterations (which could be infinite, but which you can also set to 0 if you don't want to do a single iteration).
Since the EM algorithm is by nature iterative, you must decide on a termination criterion. If you fix an upper bound on the number of steps, runtime analysis is obviously easy. For other termination criteria (like convergence up to a constant difference), the situation must be analyzed specifically.
Long story short, the description "EM" does not include a termination criterion, so the question can't be answered as such.
This is what I think:
If we presume that the EM algorithm uses linear algebra, which it does, then its complexity should be O(m.n^3), where m is the number of
iterations and n is the number of parameters.
The number of iterations will be influenced by the quality of the starting values. Good starting values means small m.
Not-so-good starting values means larger m or even failure because of convergence to a local solution. Local solutions can exist because EM is used on likelihood functions which are nonlinear.
Good starting values means that you start in the convex zone of attraction that surrounds the locus of the globally optimal solution.