Adding additional parameters to data constructors using infix operators - parsing

I have written a data constructor like
data Expr = IntL Integer | Expr :*: Expr
and would like to annotate it with extra constructor parameters (such as positional information) like this:
data Expr = IntL Integer Pos | Expr :*: Expr Pos
However GHC does not like this:
Expected kind '* -> *' but 'Expr' has kind '*'
In the type 'Expr Position'
In the definition of data constructor ':*:'
In the data declaration for 'Expr'
I know I could use something like Mul Expr Expr Pos as a work around or even wrap Expr in another data constructor, but I'd really like to use the infix operator and cannot figure a way to do so! Is this possible?
I've tried wrapping the constructor in brackets:
data Expr = IntL Integer Pos | (Expr :*: Expr) Pos
And also making :*: a prefix:
data Expr = IntL Integer Pos | (:*:) Expr Expr Pos
but this does not allow me to pattern match in the same way. I'm not sure this even makes sense as a type constructor but thought I'd ask just in case.

It might be better to do this with an extra constructor, so:
infixl 6 :*:
infixl 7 :#
data Expr = IntL Integer | PosExpr :*: PosExpr
data PosExpr = Expr :# Pos
Then you can construct items with:
(IntL 5 :# foo :*: IntL 6 :# bar) :# qux

Related

Example of removing left-recursion on a simple program

I have the following grammar which intentionally has left-recursion:
grammar DBParser;
statement: expr EOF;
expr: expr ('+' | '-') expr | 'x';
Is there a way to transform this using the method described here as:
A: Aa | b;
// becomes
A: bR;
R: (aR)?;
Does the initial A require it to be on the left-hand-side of the expression, making the above 'technique' unable to do a replacement? And if it can be replaced using that technique, what would the process look like?

Good type design in Haskell for the AST of a simple language

I'm new to Haskell, and am working through the Haskell LLVM tutorial. In it, the author defines a simple algebraic data type to represent the AST.
type Name = String
data Expr
= Float Double
| BinOp Op Expr Expr
| Var String
| Call Name [Expr]
| Function Name [Expr] Expr
| Extern Name [Expr]
deriving (Eq, Ord, Show)
data Op
= Plus
| Minus
| Times
| Divide
deriving (Eq, Ord, Show)
However, this is not an ideal structure, because the parser actually expects that the list of Expr in an Extern will only ever contain expressions representing variables (i.e. parameters in this situation cannot be arbitrary expressions). I would like to make the types reflect this constraint (making it easier to generate random valid ASTs using QuickCheck); however, for the sake of consistency in the parser functions (which all have type Parser Expr), I don't just want to say | Expr Name [Name]. I would like to do something like this:
data Expr
= ...
| Var String
...
| Function Name [Expr] Expr
| Extern Name [Var] -- enforce constraint here
deriving (Eq, Ord, Show)
But it's not possible in Haskell.
To summarize, Extern and Var should both be Expr, and Extern should have a list of Vars representing parameters. Would the best way be to split all of these out and make them instances of an Expr typeclass (that wouldn't have any methods)? Or is there a more idiomatic method (or would it be better to scrap these types and do something totally different)?
Disclaimer, I'm the author of the LLVM tutorial you mentioned.
Just use Extern Name [Name], everything after Chapter 3 onward in the tutorial uses that exact definition anyways. I think I just forgot to make Chapter 2 Syntax.hs consistent with the others.
I wouldn't worry about making the parser definitions consistent, it's fine for them to return different types. Here's what the later parsers use. identifier is just the parsec builtin for the alphanumeric identifier from the LanguageDef that becomes the Name type in the AST.
extern :: Parser Expr
extern = do
reserved "extern"
name <- identifier
args <- parens $ many identifier
return $ Extern name args

How to build Parser in Haskell

data Expr = ExprNum Double -- constants
| ExprVar String -- variables
| ExprAdd Expr Expr
| ExprSub Expr Expr
| ExprNeg Expr -- The unary '-' operator
| ExprMul Expr Expr
| ExprDiv Expr Expr
deriving Show
This is my user define data type. I want to handle arithmetic expression like (2+3 *4 - x) using above data types without using buildExpression parser. What can I do?
Please help me.It should handle operator precedence.
Suppose we want to build an addsub level parser. We'd like to say that (ignoring actual returning of correct values and just focusing on the raw parsing)
addsub = muldiv >> oneOf "+-" >> muldiv
This doesn't really work. But we can left factor this as
addsub = muldiv >> addsub'
addsub' = many $ oneOf "+-" >> muldiv
Where we assume muldiv is a parser for just multiplication and division which you can write in a similar manner.
That is, instead of using the grammar
addsub = addsub (+-) muldiv | muldiv
We use the slightly more complicated, but actually usable by Parsec:
addsub = muldiv addsub'
addsub' = (+-) muldiv addsub' | Nothing
Which we can of course refactor the latter into a many which gives us a list of expressions that we would add. You could then convert that to whatever form you want, such as (Add a1 (Add a2 (Add a3))).

What to use in ANTLR4 to resolve ambiguities in more complex cases (instead of syntactic predicates)?

In ANTLR v3, syntactic predicates could be used to solve ambiguitites, i.e., to explicitly tell ANTLR which alternative should be chosen. ANTLR4 seems to simply accept grammars with similar ambiguities, but during parsing it reports these ambiguities. It produces a parse tree, despite these ambiguities (by chosing the first alternative, according to the documentation). But what can I do, if I want it to chose some other alternative? In other words, how can I explicitly resolve ambiguities?
(For the simple case of the dangling else problem see: What to use in ANTLR4 to resolve ambiguities (instead of syntactic predicates)?)
A more complex example:
If I have a rule like this:
expr
: expr '[' expr? ']'
| ID expr
| '[' expr ']'
| ID
| INT
;
This will parse foo[4] as (expr foo (expr [ (expr 4) ])). But I may want to parse it as (expr (expr foo) [ (expr 4) ]). (I. e., always take the first alternative if possible. It is the first alternative, so according to the documentation, it should have higher precedence. So why it builds this tree?)
If I understand correctly, I have 2 solutions:
Basically implement the syntactic predicate with a semantic predicate (however, I'm not sure how, in this case).
Restructure the grammar.
For example, replace expr with e:
e : expr | pe
;
expr
: expr '[' expr? ']'
| ID expr
| ID
| INT
;
pe : '[' expr ']'
;
This seems to work, although the grammar became more complex.
I may misunderstood some things, but both of these solutions seem less elegant and more complicated than syntactic predicates. Although, I like the solution for the dangling else problem with the ?? operator. But I'm not sure how to use in this case. Is it possible?
You may be able to resolve this by placing the ID alternative above ID expr. When left-recursion is eliminated, all of your alternatives which are not left recursive are parsed before your alternatives which are left recursive.
For your example, the first non-left-recursive alternative ID expr matches the entire expression, so there is nothing left to parse afterwards.
To get this expression (expr (expr foo) [ (expr 4) ]), you can use
top : expr EOF;
expr : expr '[' expr? ']' | ID | INT ;

SML datatypes struggle

I have the following grammar that I need to translate into SML datatypes:
Integer ranges over SML integer constants.
Boolean ::= 'true' | 'false'
Operator ::= 'ADD' | 'IF' | 'LESS_THAN'
Arguments ::= ( ',' Expression ) *
Expression ::= 'INT' '(' Integer ')'
| 'BOOL' '(' Boolean ')'
| 'OPERATION' '(' Operator ',' '[' Expression ( ',' Expression ) * ']' ')'
I have managed the following:
datatype BOOL = true | false;
datatype OPERATOR = ADD | IF | LESS_THAN;
datatype INT = INT of int;
However I am struggling with datatypes Arguments and Expression. Any help would be appreciated.
for ARGUMENTS, you can use a sequence of EXPRESSIONs, so something like a list of EXPRESSION would work fine (the parentheses need to be parsed, but you don't need to store them in your type, since they are always there).
for EXPRESSION, you need to combine the approach you've used in OPERATOR (where you have alternatives) with what you've done in INT (where you have of ...). in otherwords it's going to be of the form A of B | C of D | ....
also, you don't really need INT of int - you could just use an simple int (ie an integer) for INT - and i suspect ML has a boolean type that could use instead of defining a datatype for BOOL (in other words, you probably don't need to define a datatype at all for either of those - just use what is already present in the language).
ps it's also normal to add the "homework" tag for homework.
[edit for OPERATOR you have multiple types, but that's ok - just stick them in a tuple, like (A,B) whose type is written a * b. for the sequence of expressions, use a list, like for ARGUMENTS.]

Resources