Why is this grammar not LL(1) - parsing

I have the next grammar:
C := (PZ)
P := X | C
X := iQ | eQ | rQ
Q := AX | ε
A := +
L := >
Z := LP | AP | ε
and i am using JFLAP to build an LL(1) parsing table, but at the moment that i type those rules, JFLAP throws me an error that says: the grammar is not LL(1).I have found where the mistake is, in the rule 'Q'.
The first set of Q is Q = {+,ε}, and the follow set of Q is Q = { ), + , >} and in the parsing table i am going to have two rules in table[Q,+] and that´s the mistake, but i dont know how to fix it because i need to have the rule Q -> ε

The basic problem is that your grammar is ambiguous -- you have two nested repeating patterns from your rules for X and Z and both of them can match an i+i fragment. So you need to decide how you want to resolve that ambiguity -- which way should a fragment like i+i match:
PZ PZ
/ \ / \
X ε X AP
/ \ / \ / \
i Q i Q + X
/ \ / / \
A X ε i Q
/ / \ |
+ i Q ε
|
ε
The easiest fix is to make it always match the right example, which you can do by just getting rid of the X/Q repeating pattern:
C := (PZ)
P := X | C
X := i | e | r
A := +
L := >
Z := LP | AP | ε
If you want to always match the left example, you need to disallow + in the Z pattern:
C := (PZ)
P := X | C
X := iQ | eQ | rQ
Q := AX | ε
A := +
L := >
Z := LP | ε

Related

Simple refl-based proof problem in Lean (but not in Agda)

In an attempt to define skew heaps in Lean and prove some results, I have defined a type for trees together with a fusion operation:
inductive tree : Type
| lf : tree
| nd : tree -> nat -> tree -> tree
def fusion : tree -> tree -> tree
| lf t2 := t2
| t1 lf := t1
| (nd l1 x1 r1) (nd l2 x2 r2) :=
if x1 <= x2
then nd (fusion r1 (nd l2 x2 r2)) x1 l1
else nd (fusion (nd l1 x1 l1) r2) x2 l2
Then, even for an extremely simple result such as
theorem fusion_lf : ∀ (t : tree), fusion lf t = t := sorry
I'm stuck. I really have no clue for starting to write this proof. If I start like this:
begin
intro t,
induction t with g x d,
refl,
end
I can use refl for the case where t is lf, but not if it is a nd.
I'm a bit at a lost, since in Agda, it is really easy. If I define this:
data tree : Set where
lf : tree
nd : tree -> ℕ -> tree -> tree
fusion : tree -> tree -> tree
fusion lf t2 = t2
fusion t1 lf = t1
fusion (nd l1 x1 r1) (nd l2 x2 r2) with x1 ≤? x2
... | yes _ = nd (fusion r1 (nd l2 x2 r2)) x1 l1
... | no _ = nd (fusion (nd l1 x1 r1) r2) x2 l2
then the previous result is obtained directly with a refl:
fusion_lf : ∀ t -> fusion lf t ≡ t
fusion_lf t = refl
What I have missed?
This proof works.
theorem fusion_lf : ∀ (t : tree), fusion lf t = t :=
λ t, by cases t; simp [fusion]
If you try #print fusion.equations._eqn_1 or #print fusion.equations._eqn_2 and so on, you can see the lemmas that simp [fusion] will use. The case splits are not exactly the same as the case splits in the pattern matching, because the case splits in the pattern matching actually duplicate the case lf lf. This is why I needed to do cases t. Usually the equation lemmas are definitional equalities, but this time they are not, and honestly I don't know why.

How to remove ambiguity in the following grammar?

How to remove ambiguity in following grammar?
E -> E * F | F + E | F
F -> F - F | id
First, we need to find the ambiguity.
Consider the rules for E without F; change F to f and consider it a terminal symbol. Then the grammar
E -> E * f
E -> f + E
E -> f
is ambiguous. Consider f + f * f:
E E
| |
+-------+--+ +-+-+
| | | | | |
E * f f + E
+-+-+ |
| | | +-+-+
f + E E * f
| |
f f
We can resolve this ambiguity by forcing * or + to take precedence. Typically, * takes precedence in the order of operations, but this is totally arbitrary.
E -> f + E | A
A -> A * f | f
Now, the string f + f * f has just one parsing:
E
|
+-+-+
| | |
f + E
|
A
|
+-+-+
A * f
|
f
Now, consider our original grammar which uses F instead of f:
E -> F + E | A
A -> A * F | F
F -> F - F | id
Is this ambiguous? It is. Consider the string id - id - id.
E E
| |
A A
| |
F F
| |
+-----+----+----+ +----+----+----+
| | | | | |
F - F F - F
| | | |
+-+-+ id id +-+-+
F - F F - F
| | | |
id id id id
The ambiguity here is that - can be left-associative or right-associative. We can choose the same convention as for +:
E -> F + E | A
A -> A * F | F
F -> id - F | id
Now, we have only one parsing:
E
|
A
|
F
|
+----+----+----+
| | |
id - F
|
+--+-+
| | |
id - F
|
id
Now, is this grammar ambiguous? It is not.
s will have #(+) +s in it, and we always need to use production E -> F + E exactly #(+) times and then production E -> A once.
s will have #(*) *s in it, and we always need to use production A -> A * F exactly #(*) times and then production E -> F once.
s will have #(-) -s in it, and we always need to use production F -> id - F exactly #(-) times and the production F -> id once.
That s has exactly #(+) +s, #(*) *s and #(-) -s can be taken for granted (the numbers can be zero if not present in s). That E -> A, A -> F and F -> id have to be used exactly once can be shown as follows:
If E -> A is never used, any string derived will still have E, a nonterminal, in it, and so will not be a string in the language (nothing is generated without taking E -> A at least once). Also, every string that can be generated before using E -> A has at most one E in it (you start with one E, and the only other production keeps one E) so it is never possible to use E -> A more than once. So E -> A is used exactly once for all derived strings. The demonstration works the same way for A -> F and F -> id.
That E -> F + E, A -> A * F and F -> id - F are used exactly #(+), #(*) and #(-) times, respectively, is apparent from the fact that these are the only productions that introduce their respective symbols and each introduces one instance.
If you consider the sub-grammars of our resulting grammars, we can prove they are unambiguous as follows:
F -> id - F | id
This is an unambiguous grammar for (id - )*id. The only derivation of (id - )^kid is to use F -> id - F k times and then use F -> id exactly once.
A -> A * F | F
We have already seen that F is unambiguous for the language it recognizes. By the same argument, this is an unambiguous grammar for the language F( * F)*. The derivation of F( * F)^k will require the use of A -> A * F exactly k times and then the use of A -> F. Because the language generated from F is unambiguous and because the language for A unambiguously separates instances of F using *, a symbol not generated by F, the grammar
A -> A * F | F
F -> id - F | id
Is also unambiguous. To complete the argument, apply the same logic to the grammar generating (F + )*A from the start symbol E.
To remove an ambiguity means that you must choose one of all possible ambiguities. This grammar is as simple as it can be, for a mathematical expression.
To make the multiplication with a higher priority than the addition and the subtraction (where the last two have the same priority, but are traditionally computed from left to right) you do that (in ABNF like syntax):
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = identifier *("*" identifier)
identifier = 'a'-'z'
The idea is as follows:
first create your lowest grammar rule: the identifier
continue with the highest priority operation, in your case multiplication: *
create a rule that has this on its right hand side: X *(P X), where X is the previous rule you have created, and P is your operation sign.
if you have more than one operation with the same priority they must be in a group: (P1 / P2 / ...)
continue to do the last two operations until there are no more operations to add.
add your main rule that uses the latest one.
Then for input like: a+b+c*d+e you get this tree:
More advanced tools will get you a tree that has more than two nodes. That means that all multiplications in one addition will be in a list that you can iterate from any direction.
This grammar is easy to upgrade, and to add parentheses you can do that:
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = primary *("*" primary)
primary = identifier / "(" expression ")"
identifier = 'a'-'z'
Then for input (a+b)*c you will get this tree:
If you want to add a division, you can modify the multiplication rule like that:
multiplication = primary *(("*" / "/") primary)
These are all detailed trees, there are trees with less details as well, often called abstract syntax trees.

Fixpoint functions in Type Class Instances

I am trying to use a fixpoint style function in the context of a type class instance but it doesn't seem to work. Is there something extra I have to do to make this work? For the time being I've used a hack of moving the function outside the type class and explicitly declaring it Fixpoint. This seems awful, however.
Here's the short example:
Inductive cexp : Type :=
| CAnd : cexp -> cexp -> cexp
| COr : cexp -> cexp -> cexp
| CProp : bool -> cexp.
Class Propable ( A : Type ) := { compile : A -> Prop }.
Instance: Propable cexp :=
{ compile c :=
match c with
| CAnd x y => (compile x) /\ (compile y)
| COr x y => (compile x) \/ (compile y)
| CProp _ => False
end
}.
This fails with:
Error: Unable to satisfy the following constraints:
In environment:
c, x, y : cexp
?Propable : "Propable cexp"
What does one have to do to make this work?
You can use fix to do that:
Instance: Propable cexp :=
{ compile := fix compile c :=
match c with
| CAnd x y => (compile x) /\ (compile y)
| COr x y => (compile x) \/ (compile y)
| CProp _ => False
end
}.
Let me illustrate how one can come up with it. Let's take the following piece of code:
Fixpoint silly n :=
match n with
| 0 => 0
| S n => silly n
end.
Fixpoint here is a vernacular command which makes the definition a little bit easier on the eyes, but it conceals what is going on here. It turns out that what Coq actually does is something like this:
Definition silly' :=
fix self n :=
match n with
| 0 => 0
| S n => self n
end.
You can verify it by using Print silly. after the definition.

LL1 grammar hidden ambiguity

I've been trying to generate a parser table for the following grammar:
S-> LP ; E
LP-> LP ; num | num
E-> num | var | fun ( E )
The need to eliminate left-recursion is evident, so it ends up:
S-> LP ; E
LP-> num LP'
LP'-> ; num LP' | λ
E-> num | var | fun ( E )
The problem comes when finding firsts and follows for LP and LP', where:
First(LP)= num
Follow(LP)= ;
First(LP') = ; λ
Follow(LP') = ;
Since LP' has λ as a first, that means during the creation of the parsing table the productions that would take me from LP' to ; are:
LP'-> ; num LP'
LP'-> λ
Thus creating ambiguity. Is there any way to solve this problem? I've looked into previous questions and eliminating λ was mentioned but I haven't been able to do so in a way that eliminates ambiguity.
Any help is appreciated, thanks in advance.

Follow sets Top-Down parsing

I have a question for the Follow sets of the following rules:
L -> CL'
L' -> epsilon
| ; L
C -> id:=G
|if GC
|begin L end
I have computed that the Follow(L) is in the Follow(L'). Also Follow(L') is in the Follow(L) so they both will contain: {end, $}. However, as L' is Nullable will the Follow(L) contain also the Follow(C)?
I have computed that the Follow(C) = First(L') and also Follow(C) subset Follow(L) = { ; $ end}.
In the answer the Follow(L) and Follow(L') contain only {end, $}, but shouldn't it contain ; as well from the Follow(C) as L' can be null?
Thanks
However, as L' is Nullable will the Follow(L) contain also the Follow(C)?
The opposite. Follow(C) will contain Follow(L). Think of the following sentence:
...Lx...
where X is some terminal and thus is in Follow(L). This could be expanded to:
...CL'x...
and further to:
...Cx...
So what follows L, can also follow C. The opposite is not necessarily true.
To calculate follows, think of a graph, where the nodes are (NT, n) which means non-terminal NT with the length of tokens as follow (in LL(1), n is either 1 or 0). The graph for yours would look like this:
_______
|/_ \
(L, 1)----->(L', 1) _(C, 1)
| \__________|____________/| |
| | |
| | |
| _______ | |
V |/_ \ V V
(L, 0)----->(L', 0) _(C, 0)
\_______________________/|
Where (X, n)--->(Y, m) means the follows of length n of X, depend on follows of length m of Y (of course, m <= n). That is to calculate (X, n), first you should calculate (Y, m), and then you should look at every rule that contains X on the right hand side and Y on the left hand side e.g.:
Y -> ... X REST
take what REST expands to with length n - m for every m in [0, n) and then concat every result with every follow from the (Y, m) set. You can calculate what REST expands to while calculating the firsts of REST, simply by holding a flag saying whether REST completely expands to that first, or partially. Furthermore, add firsts of REST with length n as follows of X too. For example:
S -> A a b c
A -> B C d
C -> epsilon | e | f g h i
Then to find follows of B with length 3 (which are e d a, d a b and f g h), we look at the rule:
A -> B C d
and we take the sentence C d, and look at what it can produce:
"C d" with length 0 (complete):
"C d" with length 1 (complete):
d
"C d" with length 2 (complete):
e d
"C d" with length 3 (complete or not):
f g h
Now we take these and merge with follow(A, m):
follow(A, 0):
epsilon
follow(A, 1):
a
follow(A, 2):
a b
follow(A, 3):
a b c
"C d" with length 0 (complete) concat follow(A, 3):
"C d" with length 1 (complete) concat follow(A, 2):
d a b
"C d" with length 2 (complete) concat follow(A, 1):
e d a
"C d" with length 3 (complete or not) concat follow(A, 0) (Note: follow(X, 0) is always epsilon):
f g h
Which is the set we were looking for. So in short, the algorithm becomes:
Create the graph of follow dependencies
Find the connected components and create a DAG out of it.
Traverse the DAG from the end (from the nodes that don't have any dependency) and calculate the follows with the algorithm above, having calculated firsts beforehand.
It's worth noting that the above algorithm is for any LL(K). For LL(1), the situation is much simpler.

Resources