DFA minimization in F# - f#

I'm completely new to functional programming and have elected to use F# for a project which entails the parsing and minimization of a DFA.
I currently have my parser completed and am able to format each element of the DFA's tuple (states, alphabet, transition function, start state, final states) in whatever way I'd like and I have reached the point where I need to implement the minimization algorithm. The algorithm being used is:
For some DFA (Q, Σ, δ, S, F) where
Q: The set of states
Σ: The alphabet
δ: The transition function
S: The start state
F: The set of final states
Step 1. For each pair of states
(p, q) ∈ Q x Q
If p ∈ F and q ∉ F (or vice versa), then set distinct(p, q) = 1.
Step 2. Loop until there is no change in the table contents:
For each pair of states (p, q) ∈ Q x Q:
For each alphabet symbol a ∈ alphabet:
If distinct(p, q) = 0 and distinct(δ(p, a), δ(q, a)) = 1, then set
distinct(p, q) = 1.
I have the DFA tuple elements formatted like so:
States:
["0";"1";"2";"3"]
Alphabet:
["a";"b"]
Transition Function (ie: ["0";"a";"1"] is read as "0 on an 'a' goes to 1"]
[["0";"a";"1"];["1";"a";"1"];["1";"b";"2"];["2";"a";"0"];...;["5";"a";"4"]
Start State:
["0"]
Final States:
["1";"5"]
I also have have a distinct table formatted. It's basically the Cartesian product of States x States (QxQ from the above minimization algorithm) with any repeated products and duplicate elements ignored:
[["0";"1"];["0";"2"];["0";"3"];["0";"4"];["0";"5"];["1";"2"];
["1";"3"];["1";"4"];["1";"5"];["2";"3"];["2";"4"];["2";"5"];
["3";"4"];["3";"5"];["4";"5"]]
My initial strategy was to make a new list with only pairs which are either both non-final, or both final. (The only two conditions failing the 'Step 1' condition).
My problem is this: I am having a difficult time coming up with a way to compare the resulting list to the transition function of each pair of states. For example, take the pair of states ["1";"5"]. As the algorithm states, we must compare what happens to '1' for each alphabet character to what happens to '5' for each alphabet character. In this case, the transition function states:
For 1:
["1";"a";"1"];["1";"b";"2"]
-'1' on an 'a' goes to '1'
-'1' on a 'b' goes to '2'
And for 5:
["5";"a";"4"]
-'5' on an 'a' goes to '4'
Because both states, '5' and '1', behave differently when passed the same alphabet character, they are distinct. But, as I've stated, I'm not at all clear as to how to implement this comparison.
Any help would be greatly appreciated. Take care!

If your transition function is stored as triples as you show above:
Sort them into separate lists of state pairs according to letter
Make a random access array (or associative map) out of each such list
Since you are inverting the transition function, the index/key of each mapping will be the destination state, and the contents/value will be a list of zero or more origin states that the letter maps to it.
Make sure that all elements of your "distinct" list have their first element less than their second (by swapping the two if necessary), and make sure the list itself is sorted. Then, for each array or map:
Apply the mapping to your "distinct" list as follows:
look up both elements of each "distinct" pair
perform the cartesian product of the two "origin state" lists from a given pair
combine the resulting pairs from all cartesian products into a single list
For this new list:
Filter out any resulting tuples with identical elements
Swap any tuples whose first element is greater than the second
Sort the result
Eliminate neighboring duplicates
Do a merge pass with the "distinct" list:

Related

Does erlang have a hidden rownum on a list?

This is an example of my current code:
DataSet = [1,2,3,4,5,6,7,8,9].
Sequence = [3,4,5,6].
ReducedDataSet = lists:foldl( fun(SeqNumber, Output) ->
Row = lists:nth(SeqNumber, DataSet),
[Row|Output]
end,
[],
Sequence
).
ReducedDataSet ends up as [6,5,4,3] and if I change it to lists:foldr, ReducedDataSet would be [3,4,5,6].
I didn't expect this as when absorbed left to right, the 3rd value is 3 and should proceed to 6, but when absorbed right to left, the 3rd value would be 7, and proceed to 4.
Does this mean there's a hidden row number on my list, and foldl and foldr only differ in the sort order of the final list?
I think this is a more general fold question.
In general, fold performs the following: (new_element, acc) -> new_acc
If the operation new_element ° acc is commutative (e.g. the sum), foldl and foldr are the same.
If the operation is "append" there is a difference between appending the element to the left or to the right.
[3] ° 4 -> [3, 4] VS 4 ° [3] -> [4, 3]
I never remember which is foldl and foldr but I think left/right refers to the position of the accumulator ([3] ° 4 is foldl with this definition)
TL;DR
No, there is no hidden index or "row number" in an Erlang list.
Discussion
It may be helpful to explore the nature of list operations a bit more in the context of functional lists of the "lists are a bunch of conses" variety.
I wrote an explanation of folds a while back that might be useful to you: Explanation of lists:fold function
Keep in mind that functional lists only have pointers that go one-way. That is, they are singly linked lists. There is no concept of a "rownum" or "index" as it would be in a C style array. Each call to lists:nth/2 is actually traversing the list to the nth element before returning that element.
We could write lists:nth/2 like this if we want a version that crashes on bad input (and, looking it up, it turns out that it is written almost exactly like this):
nth(1, [Element | _]) ->
Element;
nth(N, [_ | Rest]) when N > 1 ->
lists:nth(N - 1, Rest).
(As a side note, consider not inlining funs that require you to write multi-line definitions as function arguments...)

Polish to infix notation

Lets say we have an expression in prefix notation or(1) and A B or(2) or(3) C D E (where A, B, C, D, E are boolean values and or numbered for convenience) that we want to convert to an infix notation. In principle I have two ways to evaluate it:
(1) start at or(3) C D, then or(2), then and, then or(1)
(2) start at and A B then check or(3), or(2). Lastly check or(1)
(1) Evaluate starting from right most operator
(2) Evaluate starting from left most operator having all operands as it's direct neighbors.
Both evaluations yield (A and B) or C or D or E.
Which evaluation sequence is correct?
Will these two evaluations ever give different result for the same prefix record?
http://www.cs.man.ac.uk/~pjj/cs212/fix.html recommends the first method.
You will get the same result regarding of the order, so it is up to you.

Erlang implementing an amb operator.

On wikipedia it says that using call/cc you can implement the amb operator for nondeterministic choice, and my question is how would you implement the amb operator in a language in which the only support for continuations is to write in continuation passing style, like in erlang?
If you can encode the constraints for what constitutes a successful solution or choice as guards, list comprehensions can be used to generate solutions. For example, the list comprehension documentation shows an example of solving Pythagorean triples, which is a problem frequently solved using amb (see for example exercise 4.35 of SICP, 2nd edition). Here's the more efficient solution, pyth1/1, shown on the list comprehensions page:
pyth1(N) ->
[ {A,B,C} ||
A <- lists:seq(1,N-2),
B <- lists:seq(A+1,N-1),
C <- lists:seq(B+1,N),
A+B+C =< N,
A*A+B*B == C*C
].
One important aspect of amb is efficiently searching the solution space, which is done here by generating possible values for A, B, and C with lists:seq/2 and then constraining and testing those values with guards. Note that the page also shows a less efficient solution named pyth/1 where A, B, and C are all generated identically using lists:seq(1,N); that approach generates all permutations but is slower than pyth1/1 (for example, on my machine, pyth(50) is 5-6x slower than pyth1(50)).
If your constraints can't be expressed as guards, you can use pattern matching and try/catch to deal with failing solutions. For example, here's the same algorithm in pyth/1 rewritten as regular functions triples/1 and the recursive triples/5:
-module(pyth).
-export([triples/1]).
triples(N) ->
triples(1,1,1,N,[]).
triples(N,N,N,N,Acc) ->
lists:reverse(Acc);
triples(N,N,C,N,Acc) ->
triples(1,1,C+1,N,Acc);
triples(N,B,C,N,Acc) ->
triples(1,B+1,C,N,Acc);
triples(A,B,C,N,Acc) ->
NewAcc = try
true = A+B+C =< N,
true = A*A+B*B == C*C,
[{A,B,C}|Acc]
catch
error:{badmatch,false} ->
Acc
end,
triples(A+1,B,C,N,NewAcc).
We're using pattern matching for two purposes:
In the function heads, to control values of A, B and C with respect to N and to know when we're finished
In the body of the final clause of triples/5, to assert that conditions A+B+C =< N and A*A+B*B == C*C match true
If both conditions match true in the final clause of triples/5, we insert the solution into our accumulator list, but if either fails to match, we catch the badmatch error and keep the original accumulator value.
Calling triples/1 yields the same result as the list comprehension approaches used in pyth/1 and pyth1/1, but it's also half the speed of pyth/1. Even so, with this approach any constraint could be encoded as a normal function and tested for success within the try/catch expression.

Testing intersection of two regular languages

I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string.
The language is specified by a glob-like string like
/foo/**/bar/*.baz
where ** matches 0 or more characters, and * matches zero or more characters that are not /, and all other characters are literal.
Any ideas?
thanks,
mike
EDIT:
I implemented something that seems to perform well, but have yet to try a correctness proof. You can see the source and unit tests
Build FAs A and B for both languages, and construct the "intersection FA" AnB. If AnB has at least one accepting state accessible from the start state, then there is a word that is in both languages.
Constructing AnB could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is:
The states of AnB is the cartesian product of the states of A and B respectively. A state in AnB is written (a, b) where a is a state in A and b is a state in B.
A transition (a, b) ->r (c, d) (meaning, there is a transition from (a, b) to (c, d) on symbol r) exists iff a ->r c is a transition in A, and b ->r d is a transition in B.
(a, b) is a start state in AnB iff a and b are start states in A and B respectively.
(a, b) is an accepting state in AnB iff each is an accepting state in its respective FA.
This is all off the top of my head, and hence completely unproven!
I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is:
Convert both regular expressions to NFAs A and B
Create a NFA, C, that represents the intersection of A and B.
Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point).
I know this might be a little hard to follow but this is only way I know how.

refactoring boolean equation

Let's say you have a Boolean rule/expression like so
(A OR B) AND (D OR E) AND F
You want to convert it into as many AND only expressions as possible, like so
A AND D AND F
A AND E AND F
B AND D AND F
B AND E AND F
You are just reducing the OR's so it becomes
(A AND D AND F) OR (A AND E AND F) OR (...)
Is there a property in Boolean algebra that would do this?
Take a look at DeMorgan's theorem. The link points to a document relating to electronic gates, but the theory remains the same.
It says that any logical binary expression remains unchanged if we
Change all variables to their complements.
Change all AND operations to ORs.
Change all OR operations to ANDs.
Take the complement of the entire expression.
(quoting from the above linked document)
Your example is exploiting the the distributivity of AND over OR, as shown here.
All you need to do is apply that successively. For example, using x*(y+z) = (x*y)+(x*z) (where * denotes AND and + denotes OR):
0. (A + B) * (D + E) * F
1. Apply to the first 2 brackets results in ((A+B)*D)+((A+B)*E)
2. Apply to content of each bracket results in (A*D+B*D)+(A*E+B*E)
3. So now you have ((A*D+B*D)+(A*E+B*E))*F
4. Applying the law again results in (A*D+B*D)*F+(A*E+B*E)*F
5. Apply one more time results in A*D*F+B*D*F+A*E*F+B*E*F, QED
You may be interested in reading about Karnaugh maps. They are a tool for simplifying boolean expressions, but you could use them to determine all of the individual expressions as well. I'm not sure how you might generalize this into an algorithm you could write a program for though.
You might be interested in Conjunctive Normal form or its brother, Disjunctive normal form.
As far as I know boolean algebra can not be build only with AND and OR operations.
If you have only this two operation you are not able to receive NOT operation.
You can convert any expression to the full set of boolean operations.
Here is some full sets:
AND and NOT
OR and NOT
Assuming you can use the NOT operation, you can rewrite any Boolean expression with only ANDs or only ORs. In your case:
(A OR B) AND (D OR E) AND F
I tend to use engineering shorthand for the above and write:
AND as a product (. or nothing);
OR as a sum (+); and
NOT as a single quote (').
So:
(A+B)(D+E)F
The corollary to arithmetic is actually quite useful for factoring terms.
By De Morgan's Law:
(A+B) => (A'B')'
So you can rewrite your expression as:
(A+B)(D+E)F
(A'B')'(D'E')'F

Resources