How do you write an n-ary tree in postfix notation? - parsing

I am trying to understand this paper, Tree template matching in ranked ordered trees by pushdown automata. The first step is having the tree in postfix notation.
How do I take a tree such as this:
foo
bar
abc
def
bar
abc
a
b
a
b
c
d
e
def
abc
baz
bar
abc
a
b
c
abc
def
And write that in postfix notation?

It doesn't make a lot of sense. However, you can either use parentheses:
...(abc a b c)bar abc def)baz)foo
Or specify the number of operands with each operator:
... abc a b c bar4 abc def baz3 foo3
or even:
... abc0 a0 b0 c0 bar4 abc0 def0 baz3 foo3

In the terms of that paper, the tree you are asking about is impossible because you have nodes with the same "symbol" (name) with different numbers of children. The paper, however, is assuming that every symbol in the alphabet has a specified "arity" (the number of children for a node labelled with that symbol). Leaf symbols have arity 0, by the way.
This is (very briefly) mentioned in the Basic Definitions section at the beginning:
A ranked alphabet is a couple 𝒜 = (Σ, φ), where Σ is an alphabet and φ is a mapping . The arity (rank) of a symbol x ∈ Σ is φ(x).
In other words, there is a mathematical function which tells you how many children a labelled node will have, which you can use in the postfix notation to know how many subtrees precede that symbol. (Note also that 𝒜, which includes the arity function, is part of their definition of a PDA.)

Related

How many equivalence classes in the RL relation for {w in {a, b}* | (#a(w) mod m) = ((#b(w)+1) mod m)}

How many equivalence classes in the RL relation for
{w in {a, b}* | (#a(w) mod m) = ((#b(w)+1) mod m)}
I am looking at a past test question which gives me the options
m(m+1)
2m
m^2
m^2+1
infinite
However, i claim that its m, and I came up with an automaton that I believe accepts this language which contains 3 states (for m=3).
Am I right?
Actually you're right. To see this, observe that the difference of #a(w) and #b(w), #a(w) - #b(w) modulo m, is all that matters; and there are only m possible values of this difference modulo m. So, m states are always sufficient to accept a language of this form: simply make the state corresponding to the appropriate difference the accepting state.
In your DFA, a2 corresponds to a difference of zero, a1 to a difference of one and a3 to a difference of two.

Polish to infix notation

Lets say we have an expression in prefix notation or(1) and A B or(2) or(3) C D E (where A, B, C, D, E are boolean values and or numbered for convenience) that we want to convert to an infix notation. In principle I have two ways to evaluate it:
(1) start at or(3) C D, then or(2), then and, then or(1)
(2) start at and A B then check or(3), or(2). Lastly check or(1)
(1) Evaluate starting from right most operator
(2) Evaluate starting from left most operator having all operands as it's direct neighbors.
Both evaluations yield (A and B) or C or D or E.
Which evaluation sequence is correct?
Will these two evaluations ever give different result for the same prefix record?
http://www.cs.man.ac.uk/~pjj/cs212/fix.html recommends the first method.
You will get the same result regarding of the order, so it is up to you.

DFA minimization in F#

I'm completely new to functional programming and have elected to use F# for a project which entails the parsing and minimization of a DFA.
I currently have my parser completed and am able to format each element of the DFA's tuple (states, alphabet, transition function, start state, final states) in whatever way I'd like and I have reached the point where I need to implement the minimization algorithm. The algorithm being used is:
For some DFA (Q, Σ, δ, S, F) where
Q: The set of states
Σ: The alphabet
δ: The transition function
S: The start state
F: The set of final states
Step 1. For each pair of states
(p, q) ∈ Q x Q
If p ∈ F and q ∉ F (or vice versa), then set distinct(p, q) = 1.
Step 2. Loop until there is no change in the table contents:
For each pair of states (p, q) ∈ Q x Q:
For each alphabet symbol a ∈ alphabet:
If distinct(p, q) = 0 and distinct(δ(p, a), δ(q, a)) = 1, then set
distinct(p, q) = 1.
I have the DFA tuple elements formatted like so:
States:
["0";"1";"2";"3"]
Alphabet:
["a";"b"]
Transition Function (ie: ["0";"a";"1"] is read as "0 on an 'a' goes to 1"]
[["0";"a";"1"];["1";"a";"1"];["1";"b";"2"];["2";"a";"0"];...;["5";"a";"4"]
Start State:
["0"]
Final States:
["1";"5"]
I also have have a distinct table formatted. It's basically the Cartesian product of States x States (QxQ from the above minimization algorithm) with any repeated products and duplicate elements ignored:
[["0";"1"];["0";"2"];["0";"3"];["0";"4"];["0";"5"];["1";"2"];
["1";"3"];["1";"4"];["1";"5"];["2";"3"];["2";"4"];["2";"5"];
["3";"4"];["3";"5"];["4";"5"]]
My initial strategy was to make a new list with only pairs which are either both non-final, or both final. (The only two conditions failing the 'Step 1' condition).
My problem is this: I am having a difficult time coming up with a way to compare the resulting list to the transition function of each pair of states. For example, take the pair of states ["1";"5"]. As the algorithm states, we must compare what happens to '1' for each alphabet character to what happens to '5' for each alphabet character. In this case, the transition function states:
For 1:
["1";"a";"1"];["1";"b";"2"]
-'1' on an 'a' goes to '1'
-'1' on a 'b' goes to '2'
And for 5:
["5";"a";"4"]
-'5' on an 'a' goes to '4'
Because both states, '5' and '1', behave differently when passed the same alphabet character, they are distinct. But, as I've stated, I'm not at all clear as to how to implement this comparison.
Any help would be greatly appreciated. Take care!
If your transition function is stored as triples as you show above:
Sort them into separate lists of state pairs according to letter
Make a random access array (or associative map) out of each such list
Since you are inverting the transition function, the index/key of each mapping will be the destination state, and the contents/value will be a list of zero or more origin states that the letter maps to it.
Make sure that all elements of your "distinct" list have their first element less than their second (by swapping the two if necessary), and make sure the list itself is sorted. Then, for each array or map:
Apply the mapping to your "distinct" list as follows:
look up both elements of each "distinct" pair
perform the cartesian product of the two "origin state" lists from a given pair
combine the resulting pairs from all cartesian products into a single list
For this new list:
Filter out any resulting tuples with identical elements
Swap any tuples whose first element is greater than the second
Sort the result
Eliminate neighboring duplicates
Do a merge pass with the "distinct" list:

Recognizing permutations of a finite set of strings in a formal grammar

Goal: find a way to formally define a grammar that recognizes elements from a set 0 or 1 times in any order. Subsequently, I want to parse it and generate an AST as well.
For example: Say the set of valid strings in my language is {A, B, C}. I want to define a grammar that recognizes all valid permutations of any number of those elements.
Syntactically valid strings would include:
(the empty string)
A,
B A, and
C A B
Syntactically invalid strings would include:
A A, and
B A C B
To be clear, defining all possible permutations explicitly in a CFG is unacceptable for my purposes, since larger sets would be impossible to maintain.
From what I understand, such a language fails the pumping lemma for context free languages, so the solution will not be context free or regular.
Update
What I'm after is called a "permutation language", which Benedek Nagy has done some theoretical work on as an extension to context free languages.
Regarding a parser generator, I've only found talk of implementing parsers with a permutation phase (link). Parsers evidently have an exponential lower bound on the size of resulting CFG, and I haven't found any parser generators that support it anyhow.
A sort-of solution to this problem was written in ANTLR. It uses semantic predicates to 'code around' the issue.
Assuming that the set of alternative strings is fixed and known in advance, say of size n, one can come up with a (non context-free) grammar of size O(n!). This is not asymptotically smaller than enumerating all permutations, so I suppose it cannot be considered a good solution. I believe that this grammar can be reformulated as a context-sensitive grammar (although in the form I'm suggesting below it is not).
For the example {a, b, c} mentioned in the question, one such grammar is the following. I'm using lower case letters for terminal symbols and upper case letters for non-terminals, as is customary. S is the initial non-terminal symbol.
S ::= XabcY
XabcY ::= aXbcY | bXacY | cXabY
XabY ::= ab | ba
XacY ::= ac | ca
XbcY ::= bc | cb
Non-terminals X and Y enclose the substring in the production which has not been finalized yet; this substring will eventually be replaced by a permutation of the terminals that are given between X and Y (in some arbitrary order).

refactoring boolean equation

Let's say you have a Boolean rule/expression like so
(A OR B) AND (D OR E) AND F
You want to convert it into as many AND only expressions as possible, like so
A AND D AND F
A AND E AND F
B AND D AND F
B AND E AND F
You are just reducing the OR's so it becomes
(A AND D AND F) OR (A AND E AND F) OR (...)
Is there a property in Boolean algebra that would do this?
Take a look at DeMorgan's theorem. The link points to a document relating to electronic gates, but the theory remains the same.
It says that any logical binary expression remains unchanged if we
Change all variables to their complements.
Change all AND operations to ORs.
Change all OR operations to ANDs.
Take the complement of the entire expression.
(quoting from the above linked document)
Your example is exploiting the the distributivity of AND over OR, as shown here.
All you need to do is apply that successively. For example, using x*(y+z) = (x*y)+(x*z) (where * denotes AND and + denotes OR):
0. (A + B) * (D + E) * F
1. Apply to the first 2 brackets results in ((A+B)*D)+((A+B)*E)
2. Apply to content of each bracket results in (A*D+B*D)+(A*E+B*E)
3. So now you have ((A*D+B*D)+(A*E+B*E))*F
4. Applying the law again results in (A*D+B*D)*F+(A*E+B*E)*F
5. Apply one more time results in A*D*F+B*D*F+A*E*F+B*E*F, QED
You may be interested in reading about Karnaugh maps. They are a tool for simplifying boolean expressions, but you could use them to determine all of the individual expressions as well. I'm not sure how you might generalize this into an algorithm you could write a program for though.
You might be interested in Conjunctive Normal form or its brother, Disjunctive normal form.
As far as I know boolean algebra can not be build only with AND and OR operations.
If you have only this two operation you are not able to receive NOT operation.
You can convert any expression to the full set of boolean operations.
Here is some full sets:
AND and NOT
OR and NOT
Assuming you can use the NOT operation, you can rewrite any Boolean expression with only ANDs or only ORs. In your case:
(A OR B) AND (D OR E) AND F
I tend to use engineering shorthand for the above and write:
AND as a product (. or nothing);
OR as a sum (+); and
NOT as a single quote (').
So:
(A+B)(D+E)F
The corollary to arithmetic is actually quite useful for factoring terms.
By De Morgan's Law:
(A+B) => (A'B')'
So you can rewrite your expression as:
(A+B)(D+E)F
(A'B')'(D'E')'F

Resources