I have a rule in my grammar such as
A -> B C D E {: ...some actions... :}
;
D -> /*empty*/ {: some actions using attributes of B and C :}
;
To implement the actions associated with production rule of D, I need to access the parser stack. How can I do that in CUP?
Rewrite your grammar:
A -> A1 E
A1 -> B C D
If the action for the first production requires B and C as well, then the semantic value of A1 will have to be more complicated in order to pass the semantic values through.
Related
I'm working on a writing simple parser in Haskell and have this datatype which holds the results of the parse.
data AST = Imm Integer
| ArgName String
| Arg Integer
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Show, Eq)
The problem comes when I want to map over the tree to replace variable names with its reference number using a map. I have to write this code
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = case d M.!? s of
Just n -> Just (Arg n)
Nothing -> Nothing
refVars _ (Imm n) = Just $ Imm n
refVars _ (Arg n) = Just $ Arg n
refVars d (Add a1 a2) = Add <$> refVars d a1 <*> refVars d a2
refVars d (Sub a1 a2) = Sub <$> refVars d a1 <*> refVars d a2
refVars d (Mul a1 a2) = Mul <$> refVars d a1 <*> refVars d a2
refVars d (Div a1 a2) = Div <$> refVars d a1 <*> refVars d a2
Which seems incredibly redundant. Ideally I'd want to have one pattern which matches any (op a1 a2) but Haskell won't allow that. Any suggestions?
As proposed in the comments, the fix for your immediate problem is to move the information about the operator type out of the constructor:
data Op = Add | Sub | Mul | Div
data AST = Imm Integer
| ArgName String
| Arg Integer
| Op Op AST AST
This datatype has one constructor for all of the binary operations, so you only need one line to take it apart:
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = Arg <$> d !? s
refVars _ (Imm n) = Just $ Imm n
refVars _ (Arg n) = Just $ Arg n
refVars d (Op op a1 a2) = Op op <$> refVars d a1 <*> refVars d a2
You can handle all different types of binary operators without modifying refVars, but if you add different syntactic forms to your AST you'll have to add clauses to refVars.
data AST = -- other constructors as before
| Ternary AST AST AST
| List [AST]
| Call AST [AST] -- function and args
refVars -- other clauses as before
refVars d (Ternary cond tt ff) = Ternary <$> refVars d cond <*> refVars d tt <*> refVars d ff
refVars d (List l) = List <$> traverse (refVars d) l
refVars d (Call f args) = Call <$> refVars d f <*> traverse (refVars d) args
So it's still tedious - all this code does is traverse the tree to the leaves, whereupon refVars can scrutinise whether the leaf is an ArgName or otherwise. The interesting part of refVars is the one ArgName line; the remaining six lines of the function are pure boilerplate.
It'd be nice if we could define "traverse the tree" separately from "handle ArgNames". This is where generic programming comes in. There are lots of generic programming libraries out there, each with its own style and approach, but I'll demonstrate using lens.
The Control.Lens.Plated module defines a Plated class for types which know how to access their children. The deal is: you show lens how to access your children (by passing them to a callback g), and lens can recursively apply that to access the children's children and so on.
instance Plated AST where
plate g (Op op a1 a2) = Op op <$> g a1 <*> g a2
plate g (Ternary cond tt ff) = Ternary <$> g cond <*> g tt <*> g ff
plate g (List l) = List <$> traverse g l
plate g (Call f args) = Call <$> g f <*> traverse g args
plate _ a = pure a
Aside: you might object that even writing plate clause-by-clause is rather too much boilerplate. The compiler should be able to locate
the AST's children for you. lens has your back — there's a default
implementation of plate for any type which is an instance of
Data,
so you should be able to slap deriving Data onto AST and leave the
Plated instance empty.
Now we can implement refVars using transformM :: (Monad m, Plated a) => (a -> m a) -> a -> m a.
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d = transformM $ \case
ArgName s -> Arg <$> d !? s
x -> Just x
transformM takes a (monadic) transformation function and applies that to every descendant of the AST. Our transformation function searches for ArgName nodes and replaces them with Arg nodes, leaving any non-ArgNames unchanged.
For a more detailed explanation, see this paper (or the accompanying slides, if you prefer) by Neil Mitchell. It's what the Plated module is based on.
Here's how you could do it with Edward Kmett's recursion-schemes package:
{-# LANGUAGE DeriveTraversable, TemplateHaskell, TypeFamilies #-}
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
import qualified Data.Map as M
data AST = Imm Integer
| ArgName String
| Arg Integer
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Show, Eq)
makeBaseFunctor ''AST
refVars :: M.Map String Integer -> AST -> Maybe AST
refVars d (ArgName s) = case d M.!? s of
Just n -> Just (Arg n)
Nothing -> Nothing
refVars d a = fmap embed . traverse (refVars d) . project $ a
This works because your refVars function recurses just like a traverse. Doing makeBaseFunctor ''AST creates an auxiliary type based on your original type that has a Traversable instance. We then use project to switch to the auxiliary type, traverse to do the recursion, and embed to switch back to your type.
Side note: you can simplify the ArgName case to just refVars d (ArgName s) = Arg <$> d M.!? s.
I'm currently looking at two closure calculation examples using the tool at
http://jsmachines.sourceforge.net/machines/lr1.html
Example 1
S -> A c
A -> b B
B -> A b
Here, in the initial state ends up with a closure of:
[S -> .A c, $]; [A -> .b B, c]}
Example 2
S -> A B
A -> a
B -> b
B -> ''
The calculated first step closure is:
{[S -> .A B, $]; [A -> .a, b/$]}
In example 1, why is the follow of b from rule 3 not included in the lookahead? In case 2, we follow B to figure out that $ is part of the lookahead so is there some special reason to not consider all rules in case 1?
When doing a closure with ". A α" we use FIRST(α) as the lookahead, and only include the containing (parent) lookahead if ε ∈ FIRST(α). In example 1, ε ∉ FIRST(c), so the lookahead is just c. In example 2, ε ∈ FIRST(B), so we add the containing lookahead ($ in this case) to the lookahead.
FOLLOW is never relevant.
The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.
Unfortunately, it is not possible for ANTLR to support direct-left recursion when the rule has parameters passed. The only viable option is to remove the left recursion. Is there a way to remove the left-recursion in the following grammar ?
a[int x]
: b a[$x] c
| a[$x - 1]
(
c a[$x - 1]
| b c
)
;
The problem is in the second alternative involving left recursion. Any kind of help would be much appreciated.
Without the parameters and easier formatting, it would look like this:
a
: b a c
| a (c a | b c)
;
When a's left recursive alternative is matched n times, it would just mean that (c a | b c) will be matched n times, pre-pended with the terminating b a c (the first alternative). That means that this rule will always start with b a c, followed by zero or more occurrences of (c a | b c):
a
: b a c (c a | b c)*
;
I have a rule that looks like this:
a : (b | c) d;
b : 'B';
c : 'C';
d : 'D';
With this grammar ANTLR builds a flat parse tree. How can I rewrite the first rule (and leave the other two unchanged) so that whatever is matched is being returned under a root node called A?
If the first production rule was like this:
a : b d;
then it could have been rewritten as
a : b d -> ^(A b d)
and it would have solved my problem. However the first grammar rule yields more than one possibility for the resulting parse tree ^(A b d) or ^(A c d).
How do I express this when rewriting the rule?
You can use the ? operator in the rewrite as follows.
a : (b | c) d -> ^(A b? c? d);