How to solve shift/reduce conflict using operator precedence? - parsing

So I have this grammar I'm trying to build an LR(1) table for
E' -> E
E -> E + E
E -> E * E
E -> ( E )
E -> a
So far, this my table
I'm trying to solve the conflicts here. I thought about changing the grammar to postfix instead of infix but I'm not really sure if I can do that. Any ideas?

Here is your grammar, with precedence:
E' -> E
E -> E + T
E -> T
T -> T * F
T -> F
F -> ( E )
F -> a
Don't forget the extra E -> T, and T -> F, as without it the grammar will be useless.
Note: This will not work with LR(0), because you'll get a conflict.

Related

LR(1) item sets for left recursive grammar

I read several papers about creating LR(1) item sets, but none of them pertained to left recursive grammars, such as those used for parsing expressions. If I had the following grammar,
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
How would I go about creating the LR(1) item sets?
Left recursion isn't inherently a problem for LR(1) parsers, and the rules for determining the configurating sets and lookaheads is the same regardless of whether your grammar has left recursion.
In your case, we begin by augmenting the grammar with our new start symbol:
S -> E
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
Our initial configurating set corresponds to looking at the production S -> E with a lookahead of $. Initially, that gives us the following:
(1)
S -> .E [$]
We now need to expand out what E could be. That gives us these new items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
Now, let's look at the item E -> .E + T [$]. We need to expand out what E could be here, and the rules for doing so are the same as in the non-left-recursive case: we list all productions for E with the dot at the front, with a lookahead given by what follows the E in the production E -> .E + T [$]. In this case we're looking for an E with a lookahead of + because that's what follows is in the production. That adds these items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
From here, we expand out all the cases where there's a dot before a T, which gives the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
We now have to expand out the Ts in the context of T -> .T * F [$], and we do so by listing all productions of T followed by what the T is followed by in T -> .T * F [$] (namely, *). That gives us the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
T -> .T * F [*]
T -> .F [*]
And from here we'd expand out the productions that have a dot before the F. Do you see how to do that based on how things have played out so far?

How to handle operator precedence in LALR(1) parser

I was writing an LALR(1) parser for a simple arithmetic grammar
E -> E ('+' | '-') T
E -> T
T -> T ('*' | '/') F
T -> F
F -> '( E ')'
F -> int
How would I handle operator precedence for an expression such as 1 * 1 + 2 so that the 1+2 is evaluated before the multiplication?

Extending Grammar for LR Parser

I have the following grammar for basic arithmetic expressions
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> id
Where E is expression, T is term, F is factor. I'm wondering how I can extend this grammar to support further arithmetic operations such exponents possibly represented with ^ or logarithm.
Thanks
Since exponentation has higher precedence you could use the following grammar:
E -> E + T
E -> T
T -> T * F
T -> F
F -> G ^ F
F -> G
G -> log(E)
G -> (E)
G -> id

How to restrict backtracking in a monad transformer parser combinator

tl;dr, How do I implement parsers whose backtracking can be restricted, where the parsers are monad transformer stacks?
I haven't found any papers, blogs, or example implementations of this approach; it seems the typical approach to restricting backtracking is a datatype with additional constructors, or the Parsec approach where backtracking is off by default.
My current implementation -- using a commit combinator, see below -- is wrong; I'm not sure about the types, whether it belongs in a type class, and my instances are less generic than it feels like they should be.
Can anyone describe how to do this cleanly, or point me to resources?
I've added my current code below; sorry for the post being so long!
The stack:
StateT
MaybeT/ListT
Either e
The intent is that backtracking operates in the middle layer -- a Nothing or an empty list wouldn't necessarily yield an error, it'd just mean that a different branch should be tried -- whereas the bottom layer is for errors (with some contextual information) that immediately abort the parsing.
{-# LANGUAGE NoMonomorphismRestriction, FunctionalDependencies,
FlexibleInstances, UndecidableInstances #-}
import Control.Monad.Trans.State (StateT(..))
import Control.Monad.State.Class (MonadState(..))
import Control.Monad.Trans.Maybe (MaybeT(..))
import Control.Monad.Trans.List (ListT(..))
import Control.Monad (MonadPlus(..), guard)
type Parser e t mm a = StateT [t] (mm (Either e)) a
newtype DParser e t a =
DParser {getDParser :: Parser e t MaybeT a}
instance Monad (DParser e t) where
return = DParser . return
(DParser d) >>= f = DParser (d >>= (getDParser . f))
instance MonadPlus (DParser e t) where
mzero = DParser (StateT (const (MaybeT (Right Nothing))))
mplus = undefined -- will worry about later
instance MonadState [t] (DParser e t) where
get = DParser get
put = DParser . put
A couple of parsing classes:
class (Monad m) => MonadParser t m n | m -> t, m -> n where
item :: m t
parse :: m a -> [t] -> n (a, [t])
class (Monad m, MonadParser t m n) => CommitParser t m n where
commit :: m a -> m a
Their instances:
instance MonadParser t (DParser e t) (MaybeT (Either e)) where
item =
get >>= \xs -> case xs of
(y:ys) -> put ys >> return y;
[] -> mzero;
parse = runStateT . getDParser
instance CommitParser t (DParser [t] t) (MaybeT (Either [t])) where
commit p =
DParser (
StateT (\ts -> MaybeT $ case runMaybeT (parse p ts) of
Left e -> Left e;
Right Nothing -> Left ts;
Right (Just x) -> Right (Just x);))
And a couple more combinators:
satisfy f =
item >>= \x ->
guard (f x) >>
return x
literal x = satisfy (== x)
Then these parsers:
ab = literal 'a' >> literal 'b'
ab' = literal 'a' >> commit (literal 'b')
give these results:
> myParse ab "abcd"
Right (Just ('b',"cd")) -- succeeds
> myParse ab' "abcd"
Right (Just ('b',"cd")) -- 'commit' doesn't affect success
> myParse ab "acd"
Right Nothing -- <== failure but not an error
> myParse ab' "acd"
Left "cd" -- <== error b/c of 'commit'
The answer appears to be in the MonadOr type class (which unfortunately for me is not part of the standard libraries):
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
satisfying Monoid and Left Catch:
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a

Building LR(1) configuration lookahead

I really have some troubles to cauculate the lookahead when building the LR(1) item sets, i had tried some lecture notes form different sites, but still...
My example is
S -> E + S | E
E -> num | ( S )
The item set is
I0:
S’ -> . S $
S -> . E + S $
S -> . E $
E -> . num +,$
E -> . ( S ) +,$
I1:
S ->E .+ S $
S ->E . $
The first item in set I0
S’ -> . S $
is initialization.
The second item in set I0
S -> . E + S $
means there is nothing on stack, we expect to read E+S, then reduce iff the token after E+S is $.
The third item in set I0
S -> . E $
means that we expect to read E and reduce iff the token after E is $.
Then i am confused about the fouth item in set I0,
E -> . num +,$
I have no ideas why there are + and $ tokens.
and if anyone can explain this for me in plain English please.
For each configuration [A –> u•Bv, a] in I, for each production B –> w in G', and for
each terminal b in First(va) such that [B –> •w, b] is not in I: add [B –> •w, b] to I.
Thanks!!!
I think i figured it out.
i am using the algorithm of
for set I0:
Begin with [S' -> .S, $]
Match [A -> α.Bβ, a]
Then add in [B -> .γ, b]
Where terminal b is FIRST(βa)
for set I1...In
Compute GOTO(I0,X)
Add in X productions and LOOKAHEAD token
In the example
S -> E + S
S -> E
E -> num
E -> ( S )
Firstly,
S’ -> . S $
we try to match it to [A -> α.Bβ, a], That is
A =S', α = ε, B = S , β = ε , a = $ and
FIRST(βa) = {$}
Add in [B -> .γ, b], which are
S -> . E + S $ ...1
S -> . E $ ...2
in I0.
Then, we need to add in productions for E as 1 and 2.
In this case, our [A -> α.Bβ, a] are 1 and 2.
Thus, FIRST(βa) = { + , $ }, and we have
E -> . num +,$
E -> . ( S ) +,$
Now, we compute GOTO(I0, X)
For X = E
we move dot one position and found no productions need to be added. So we just add in second component $ from
S -> . E + S $
S -> . E $
which gives us I1
S ->E .+ S $
S ->E . $
and so on...
So, is this the correct and efficient way when building LR(1) item sets?
For
E -> . num +,$
E -> . ( S ) +,$
the +,$ indicate that only these tokens can follow a number or a closing parenthesis. Think about it: The grammar does noty allow adjacent num's or ()'s, they must either be at the end of the sentence or followed by a +.
As for translation request, it is a fancy way of saying how to calculate the set of tokens that can follow a given token. The +,$ above are an example. They are the only legal tokens that can follow num and ).

Resources