How to eliminate both direct and indirect left recursion? - parsing

For this grammar:
A -> B a | A a | c
B -> B b | A b | d
How to remove the left recursion?
====== My attempts ======
Eliminate direct left_rec first.
A -> (B a | c) A'
A' -> a A' | ε
B -> (A b | d) B'
B' -> b B' | ε
Then indirect. Put B into A.
A -> A b B' a A ' | d B' A' | c A'
And then I got:
A -> (d B A' | c A') A''
A' -> a A' | ε
A'' -> b B' a A' A'' | ε
This result is quite complicated. How can I use only one A' to represent this?

Related

Сreating LL grammar

I am trying to create a parser for a given language
The input language contains conditional statements if ... then ...
else and if ... then, separated by a symbol ; (semicolon). Condition
statements contain identifiers, comparison signs <,>, =, hexadecimal
numbers, sign assignments (:=). Consider the sequence as hexadecimal
numbers digits and symbols a, b, c, d, e, f starting with a digit (for
example, 89, 45ac, 0abc )
I ended up with this version of the grammar. Also removed left recursion and ambiguity
S -> if E then S S' | I := H
S' -> else S S'' | ; S | $
S'' -> ; S | $
I -> p | q
E -> HE'
E' -> >HE' | <HE' | =HE'
H -> DH'
H' -> LH' | DH' | EPS
L -> a | b | c | d | e | f
D -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Will it be possible to create an LL(1) parser according to this grammar. I am confused by the rules with two non-terminals (for example HE')

How to remove ambiguity in the following grammar?

How to remove ambiguity in following grammar?
E -> E * F | F + E | F
F -> F - F | id
First, we need to find the ambiguity.
Consider the rules for E without F; change F to f and consider it a terminal symbol. Then the grammar
E -> E * f
E -> f + E
E -> f
is ambiguous. Consider f + f * f:
E E
| |
+-------+--+ +-+-+
| | | | | |
E * f f + E
+-+-+ |
| | | +-+-+
f + E E * f
| |
f f
We can resolve this ambiguity by forcing * or + to take precedence. Typically, * takes precedence in the order of operations, but this is totally arbitrary.
E -> f + E | A
A -> A * f | f
Now, the string f + f * f has just one parsing:
E
|
+-+-+
| | |
f + E
|
A
|
+-+-+
A * f
|
f
Now, consider our original grammar which uses F instead of f:
E -> F + E | A
A -> A * F | F
F -> F - F | id
Is this ambiguous? It is. Consider the string id - id - id.
E E
| |
A A
| |
F F
| |
+-----+----+----+ +----+----+----+
| | | | | |
F - F F - F
| | | |
+-+-+ id id +-+-+
F - F F - F
| | | |
id id id id
The ambiguity here is that - can be left-associative or right-associative. We can choose the same convention as for +:
E -> F + E | A
A -> A * F | F
F -> id - F | id
Now, we have only one parsing:
E
|
A
|
F
|
+----+----+----+
| | |
id - F
|
+--+-+
| | |
id - F
|
id
Now, is this grammar ambiguous? It is not.
s will have #(+) +s in it, and we always need to use production E -> F + E exactly #(+) times and then production E -> A once.
s will have #(*) *s in it, and we always need to use production A -> A * F exactly #(*) times and then production E -> F once.
s will have #(-) -s in it, and we always need to use production F -> id - F exactly #(-) times and the production F -> id once.
That s has exactly #(+) +s, #(*) *s and #(-) -s can be taken for granted (the numbers can be zero if not present in s). That E -> A, A -> F and F -> id have to be used exactly once can be shown as follows:
If E -> A is never used, any string derived will still have E, a nonterminal, in it, and so will not be a string in the language (nothing is generated without taking E -> A at least once). Also, every string that can be generated before using E -> A has at most one E in it (you start with one E, and the only other production keeps one E) so it is never possible to use E -> A more than once. So E -> A is used exactly once for all derived strings. The demonstration works the same way for A -> F and F -> id.
That E -> F + E, A -> A * F and F -> id - F are used exactly #(+), #(*) and #(-) times, respectively, is apparent from the fact that these are the only productions that introduce their respective symbols and each introduces one instance.
If you consider the sub-grammars of our resulting grammars, we can prove they are unambiguous as follows:
F -> id - F | id
This is an unambiguous grammar for (id - )*id. The only derivation of (id - )^kid is to use F -> id - F k times and then use F -> id exactly once.
A -> A * F | F
We have already seen that F is unambiguous for the language it recognizes. By the same argument, this is an unambiguous grammar for the language F( * F)*. The derivation of F( * F)^k will require the use of A -> A * F exactly k times and then the use of A -> F. Because the language generated from F is unambiguous and because the language for A unambiguously separates instances of F using *, a symbol not generated by F, the grammar
A -> A * F | F
F -> id - F | id
Is also unambiguous. To complete the argument, apply the same logic to the grammar generating (F + )*A from the start symbol E.
To remove an ambiguity means that you must choose one of all possible ambiguities. This grammar is as simple as it can be, for a mathematical expression.
To make the multiplication with a higher priority than the addition and the subtraction (where the last two have the same priority, but are traditionally computed from left to right) you do that (in ABNF like syntax):
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = identifier *("*" identifier)
identifier = 'a'-'z'
The idea is as follows:
first create your lowest grammar rule: the identifier
continue with the highest priority operation, in your case multiplication: *
create a rule that has this on its right hand side: X *(P X), where X is the previous rule you have created, and P is your operation sign.
if you have more than one operation with the same priority they must be in a group: (P1 / P2 / ...)
continue to do the last two operations until there are no more operations to add.
add your main rule that uses the latest one.
Then for input like: a+b+c*d+e you get this tree:
More advanced tools will get you a tree that has more than two nodes. That means that all multiplications in one addition will be in a list that you can iterate from any direction.
This grammar is easy to upgrade, and to add parentheses you can do that:
expression = addition
addition = multiplication *(("+" / "-") multiplication)
multiplication = primary *("*" primary)
primary = identifier / "(" expression ")"
identifier = 'a'-'z'
Then for input (a+b)*c you will get this tree:
If you want to add a division, you can modify the multiplication rule like that:
multiplication = primary *(("*" / "/") primary)
These are all detailed trees, there are trees with less details as well, often called abstract syntax trees.

How do I make this grammar LL(1)?

Lets say I have this grammar
E -> T+Ex | F
T -> T*Fy | w
F -> E | z | ε
Now I need to make it LL(1). I've been following the steps but the solution I came up with doesn't seem quite right.
Fist lets eliminate ε-productions
E -> T+Ex | F | T+x
T -> T*Fy | w | T*y
F -> E | z
Now we'll eliminate cycles
E -> T+Ex | T+x | z
T -> T*Fy | w | T*y
F -> T+Ex | T+x | z
No we'll eliminate immediate left recursion
E -> T+Ex | T+x | z
T -> wT'
T' -> *FyT' | *yT' | ε
F -> T+Ex | T+x | z
Finally we'll replace certain RHS productions where T occurred
E -> wT'+Ex | wT'+x | z
T -> wT'
T' -> *FyT' | *yT' | ε
F -> wT'+Ex | wT'+x | z
Now this doesn't seem to be LL(1) to me since the parse table generated by this will have multiple entries for several of the terminals. What do I seem to be missing?
I figured out how to do it, so out last step we have this configuration
E -> wT'+Ex | wT'+x | z
T -> wT'
T' -> *FyT' | *yT' | ε
F -> wT'+Ex | wT'+x | z
We now need to perform left-factorization to remove productions of the form
S -> aB | aC
And make them into the proper form
S -> aA
A -> B | C
Using this we can transform our grammar into
E -> wT'+E' | z
E' -> Ex | x
T -> wT'
T' -> *T'' | ε
T'' -> FyT' | yT'
F -> wT'+F' | z
F' -> Ex | x
Since F and F' are the same as E and E' we can just remove this production so we are left with.
E -> wT'+E' | z
E' -> Ex | x
T -> wT'
T' -> *T'' | ε
T'' -> EyT' | yT'

Simplifying nested pattern matching F#

I am writing a simple expressions parser in F# and for each operator I want only to support a certain number of operands (e.g. two for Modulo, three for If). Here is what I have:
type Operator =
| Modulo
| Equals
| If
let processOperator operands operator =
match operator with
| Modulo ->
match operands with
| [ a:string; b:string ] -> (Convert.ToInt32(a) % Convert.ToInt32(b)).ToString()
| _ -> failwith "wrong number of operands"
| Equals ->
match operands with
| [ a; b ] -> (a = b).ToString()
| _ -> failwith "wrong operands"
| If ->
match operands with
| [ a; b; c ] -> (if Convert.ToBoolean(a) then b else c).ToString()
| _ -> failwith "wrong operands"
I would like to get rid of or simplify the inner list matches. What is the best way to accomplish this? Should I use multiple guards ?
open System
type Operator =
| Modulo
| Equals
| If
let processOperator operands operator =
match (operator, operands) with
| Modulo, [a: string; b] -> string ((int a) % (int b))
| Equals, [a; b] -> string (a = b)
| If, [a; b; c] -> if Convert.ToBoolean(a) then b else c
| _ -> failwith "wrong number of operands"
But I would suggest to move this logic of the operands to the parser, this way you get a clean operator expression, which is more idiomatic and straight forward to process, at the end you'll have something like this:
open System
type Operator =
| Modulo of int * int
| Equals of int * int
| If of bool * string * string
let processOperator = function
| Modulo (a, b) -> string (a % b)
| Equals (a, b) -> string (a = b)
| If (a, b, c) -> if a then b else c
Fold in the operands matching:
let processOperator operands operator =
match operator, operands with
| Modulo, [a; b] -> (Convert.ToInt32(a) % Convert.ToInt32(b)).ToString()
| Equals, [a; b] -> (a = b).ToString()
| If, [ a; b; c ] -> (if Convert.ToBoolean(a) then b else c).ToString()
| _ -> failwith "wrong number of operands"
Better yet, if you can, change the datatype to the following.
type Operator =
| Modulo of string * string
| Equals of string * string
| If of string * string * string
Then in the match, you can no longer fail.

Eliminate this indirect left recursion

The following grammar has left recursion.
X is the start simbol.
X -> Xa | Zb
Z -> Zc | d | Xa
How to remove it? Please explain it step by step.
-- remove Z left recursion--
X -> Xa | Zb
Z -> dZ' | XaZ'
Z' -> cZ' | empty
--next remove Z --
X -> Xa | dZ'b | XaZ'b
Z' -> cZ' | empty
--next factorize X--
X -> XaX' | dZ'b
X'-> Z'b | empty
Z'-> cZ' | empty
--next remove X recursion--
X -> dZ'bX''
X'' -> a X'X'' | empty
X' -> Z'b | empty
Z' -> cZ' | empty

Resources