checking if the following grammar is LR(0) and LR(1) - parsing

I have the following grammar:
S -> A
A -> B | B[*]
B -> [AB]
AB -> *,AB | epsilon
S,A,B,AB are variables and [ ] , * are terminals
Is it:
LR(0)?
LR(1)?
and how can I prove it?

Related

How to handle operator precedence in LALR(1) parser

I was writing an LALR(1) parser for a simple arithmetic grammar
E -> E ('+' | '-') T
E -> T
T -> T ('*' | '/') F
T -> F
F -> '( E ')'
F -> int
How would I handle operator precedence for an expression such as 1 * 1 + 2 so that the 1+2 is evaluated before the multiplication?

How to handle operator precedence in an LL(1) parser

I was writing an LL(1) parser for an expression grammar. I had the following grammar:
E -> E + E
E -> E - E
E -> E * E
E -> E / E
E -> INT
However, this is left recursive and I removed the left recursion with the following grammar:
E -> INT E'
E' -> + INT E'
E' -> - INT E'
E' -> * INT E'
E' -> / INT E'
E' -> ε
If I was to have the expression 1 + 2 * 3, how would the parser know to evaluate the multiplication before the addition?
Try this:
; an expression is an addition
E -> ADD
; an addition is a multiplication that is optionally followed
; by +- and another addition
ADD -> MUL T
T -> PM ADD
T -> ε
PM -> +
PM -> -
; a multiplication is an integer that is optionally followed
; by */ and another multiplication
MUL -> INT G
G -> MD MUL
G -> ε
MD -> *
MD -> /
; an integer is a digit that is optionally followed by an integer
INT -> DIGIT J
J -> INT
J -> ε
; digits
DIGIT -> 0
DIGIT -> 1
DIGIT -> 2
DIGIT -> 3
DIGIT -> 4
DIGIT -> 5
DIGIT -> 6
DIGIT -> 7
DIGIT -> 8
DIGIT -> 9

proving that a grammar is LL(1)

I'm given the following grammar :
S -> A a A b | B b B a
A -> epsilon
B -> epsilon
I know that it's obvious that it's LL(1), but I'm facing troubles constructing the parsing table.. I followed the algorithm word by word to find the first and follow of each non-terminal , correct me if I'm wrong:
First(S) = {a,b}
First(A) = First(B) = epsilon
Follow(S) = {$}
Follow(A) = {a,b}
Follow(B) = {a,b}
when I construct the parsing table, according to the algorithm, I get a conflict under the $ symbol... what the hell am I doing wrong??
a b $
A A-> epsilon
B B-> epsilon
S S -> AaAb
S -> BbBa
is it ok if I get 2 productions under $ or something?? or am I constructing the parsing table wrong? please help I'm new to the compiler course
There is a tiny mistake. Algorithm is as follows from dragon book,
for each rule (S -> A):
for each terminal a in First(A):
add (S -> A) to M[S, a]
if First(A) contains empty:
for each terminal b in Follow(S):
add (S -> A) to M[S, b]
Let's take them one by one.
S -> AaAb. Here, First(AaAb) = {a}. So add S -> AaAb to M[S, a].
S -> BbBa. Here, First(BbBa) = {b}. So add S -> BbBa to M[S, b].
A -> epsilon. Here, Follow(A) = {a, b}. So add A -> epsilon to M[A, a] and M[A, b].
B -> epsilon. Here, Follow(B) = {a, b}. So add B -> epsilon to M[B, a] and M[B, b].

Parsing an expression grammar having function application with parser combinators (left-recursion)

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:
integer numbers
identifiers ([a-zA-Z]+)
arithmetic expressions with + and * and parenthesis
structure access with . (eg foo.bar.buz)
tuples (eg (1, foo, bar.buz)) (to remove ambiguity one-tuples are written as (x,))
function application (eg foo(1, bar, buz()))
functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)
So a fairly complex program in this language is
(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)
the associativity is supposed to be
( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )
I'm currently using the very nice uu-parsinglib an applicative parser combinator library.
The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).
The remaining problem (hence this question) is function application with functions returned from other functions (f()()). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ). Any ideas how I can solve this problem elegantly using uu-parsinglib? (the problem should directly apply to parsec, attoparsec and other parser combinators as well I guess).
See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
module TestExprGrammar
(
) where
import Data.Foldable (asum)
import Data.List (intercalate)
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
data Node =
NumberLiteral Integer
| Identifier String
| Tuple [Node]
| MemberAccess Node Node
| FunctionCall Node [Node]
| BinaryOperation String Node Node
parseFunctionCall :: Parser Node
parseFunctionCall =
FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> parseParenthesisedNodeList 0
operators :: [[(Char, Node -> Node -> Node)]]
operators = [ [('+', BinaryOperation "+")]
, [('*' , BinaryOperation "*")]
, [('.', MemberAccess)]
]
samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]
parseExpr :: Parser Node
parseExpr =
foldr pChainl
(parseIdentifier
<|> parseNumber
<|> parseTuple
<|> parseFunctionCall
<|> pParens parseExpr
)
(map samePrio operators)
parseNodeList :: Int -> Parser [Node]
parseNodeList n =
case n of
_ | n < 0 -> parseNodeList 0
0 -> pListSep (pSymbol ",") parseExpr
n -> (:) <$>
parseExpr
<* pSymbol ","
<*> parseNodeList (n-1)
parseParenthesisedNodeList :: Int -> Parser [Node]
parseParenthesisedNodeList n = pParens (parseNodeList n)
parseIdentifier :: Parser Node
parseIdentifier = Identifier <$> pSome pLetter <* pSpaces
parseNumber :: Parser Node
parseNumber = NumberLiteral <$> pNatural
parseTuple :: Parser Node
parseTuple =
Tuple <$> parseParenthesisedNodeList 1
<|> Tuple [] <$ pSymbol "()"
instance Show Node where
show n =
let showNodeList ns = intercalate ", " (map show ns)
showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
in case n of
Identifier i -> i
Tuple ns -> showParenthesisedNodeList ns
NumberLiteral n -> show n
FunctionCall f args -> show f ++ showParenthesisedNodeList args
MemberAccess f g -> show f ++ "." ++ show g
BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"
Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec), I think you can solve this by folding over the result of the pSome combinator:
parseFunctionCall :: Parser Node
parseFunctionCall =
foldl' FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> pSome (parseParenthesisedNodeList 0)
This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.
I don't know this library but can show you how to remove left recursion. The standard right recursive expression grammar is
E -> T E'
E' -> + TE' | eps
T -> F T'
T' -> * FT' | eps
F -> NUMBER | ID | ( E )
To add function application you must decide its level of precedence. In most languages I've seen it is highest. So you'd add another layer of productions for function application.
E -> T E'
E' -> + TE' | eps
T -> AT'
T' -> * A T' | eps
A -> F A'
A' -> ( E ) A' | eps
F -> NUMBER | ID | ( E )
Yes this is a hairy-looking grammar and bigger than the left recursive one. That's the price of top-down predictive parsing. If you want simpler grammars use a bottom up parser generator a la yacc.

LR(1) - How to compute Lookaheads

I have trouble understanding how to compute the lookaheads.
Lets say that I have this extend grammar:
S'-> S
S -> L=R | R
L -> *R | i
R -> L
I wrote the State 0 so:
S'-> .S, {$}
S -> .L=R, {$}
S -> .R, {$}
L -> .*R, {=,$}
L -> .i, {=,$}
R -> .L {=,$}
Using many parsing emulator i see that all calculators says:
R -> .L {$}
Why? Can't the R be followed by a "="?

Resources