I've been trying for the past couple of days write a menhir (yacc) grammar for this seemingly simple language with some syntax sugar, but can't figure out to do so without ambiguities.
The relevant constructs from the language are:
Application is juxtaposition: x y z is a valid expression.
Allow type annotations: given some expression e and type t, you can write e : t. the : should be a non-associative operator.
You can (and sometimes need to) wrap certain expressions in parentheses
Arrow types: expressions of the form (x y z : a) -> b, which is sugar for (x : a) -> (y : a) -> (z : a) -> b (-> is right-associative). here x y z are identifiers, a and b are other expressions
Arrows can also be written as just a -> b, which is sugar for (_ : a) -> b.
My initial strategy was to split the expressions to two types based on the intuitive question "does it need to be wrapped in parentheses when used in application?", if the answer is yes I call it an "atom", otherwise it's an expression.
For example variables are atoms, because I can write x y. Parenthesized expressions are also atoms because they can be juxtaposed. Lambda abstraction is not an atom because it's too ambiguous (even for a human reader) to parse λx . y z - you have to write (λx . y) z. The (minimal error reproducing) grammar can be outlined as follows
%token EOF
%token <string> IDENT
%token LPAREN RPAREN COLON ARROW BOOL
%nonassoc COLON
%right ARROW
%start program
%type <expr> program
%type <expr> expr
%type <expr> atom
%%
program:
| e=expr; EOF { e }
expr:
| es=nonempty_list(atom) { unfoldApp es } // application
| e=expr; COLON; t=expr { Annot (e, t) } // annotation
| LPAREN; xs=nonempty_list(IDENT); COLON;
a=expr; RPAREN; ARROW; b=expr { unfoldPi xs a b } // arrow types
| a=expr; ARROW; b=expr { Pi ("_", a, b) } // sugar (not necessary to reproduce)
// more cases for lambda, let abstractions, etc...
atom:
| x=IDENT { Var x } // variable
| LPAREN; e=expr; RPAREN { e } // parentheses
| BOOL { Bool }
// more constants and atoms
There's clearly an undeniable ambiguity here. After the parser sees ( x with lookahead : it might be a variable x, annotated and parenthesized. Or, it could be the start of an arrow with just one variable. Even with more variables, it could be an annotated application, so the ambiguity remains. I want to prioritize the latter, meaning if it can be an arrow type, choose that.
I could probably check in the AST later if it's something like Annot (App (App (Var "x", Var "y"), Var "z"), a) and then convert manually to the correct result, but that feels sketchy and unreliable.
Is there some clean and easy way to do it with a LALR generator such as menhir?
I have problem with parsers that can manage multiple types, e.g., the parser for arithmetic expressions. At the moment my parser takes in input only integers:
aexpr :: Parser Int
aexpr = aterm `chainl` addOp
aterm :: Parser Int
aterm = afactor `chainl` mulOp
afactor :: Parser Int
afactor = parenP '(' aexpr ')'
<|>
do
s <- sign
a <- aexpr
return (s * a)
<|>
token int_const
This works well for two integers, but I am introducing the float type, so the parser for arithmetic expressions can have in input two Int, two Float or two mixed types of Int and Float. The result type of the parser should be Int or Float based on the inputs (i.e., Int if both inputs are Int, Float otherwise).
My question is: what is the best way to manage this? A simple way is to just manage all the inputs as Float, so integer values are converted into float values. But I don't like this solution: the end result will be always Float.
Should I create two different parsers, i.e., one for Int and one for Float? But the two parsers would be pretty much the same.
There is a better solution?
When designing DSLs, it's very common to define a new data type representing all the different shapes values can have:
data Value
= I Int
| D Double
deriving (Eq, Ord, Read, Show)
You may want to implement some helper functions, like:
binOp :: (forall a. Num a => a -> a -> a) -> Value -> Value -> Parser Value
binOp f (I a) (I b) = pure (I (f a b))
binOp f (D a) (D b) = pure (D (f a b))
binOp _ _ _ = fail "oof, tried to mix Ints and Doubles"
Then your mulOp and addOp implementations could call binOp (*) and binOp (+) somewhere in their implementation, for example.
But I would consider another approach than using such helpers directly. I would propose introducing an intermediate representation for which parsing "always succeeds", then add a separate type-checking phase where you can throw errors about mixing ints and doubles... or do appropriate casts, or whatever your DSL wants to happen there. So:
data BinOp = Plus | Times deriving (Bounded, Enum, Eq, Ord, Read, Show)
data Type = Int | Double deriving (Bounded, Enum, Eq, Ord, Read, Show)
data DSL
= ILit Int
| DLit Double
| BinOp BinOp DSL DSL
deriving (Eq, Ord, Read, Show)
Then you can write things with types like
typeCheck :: DSL -> These [TypeError] Type -- check for mixing
unsafeEval :: DSL -> Value -- implementation uses incomplete patterns and assumes no mixing
eval :: DSL -> Either [TypeError] Value
eval t = case typeCheck t of
That _ -> Right (unsafeEval t)
These [] _ -> Right (unsafeEval t)
These es _ -> Left es
This es -> Left es
or whatever.
beginner f# programmer here.
So basically I'm totally lost. I've been staring at this question for the past hour not even knowing how to set up the first line of the let argument. This question is asking to use pattern matching define a recursive function size : expr -> int that returns the size of its input expression denfied as the number of constructors from type expr in the expression
This is the recursive function:
size : expr -> int
Here are the constructors:
type oper = Neg | Not | Add | Mul | Sub | Less | Eq | And
type expr =
| C of int
| Op1 of oper * expr
| Op2 of oper * expr * expr
| If of expr * expr * expr
for example,
size(C 4)
would return 1
and
size (IF (C 4, Op2 (Add, C 1, C 2), C ())
would return 6
UPDATED AFTER SUGGESTION: IN PROGRESS!!
let rec size (e : expr) : int =
match e with
| C i -> 1
| Op1 (o, e1) -> size e1 + 1
| Op2 (o, e1, e2)-> size e2 + 1
| If (e1, e2, e3) -> size e3 + 1
I'll summarize the hints that you were given in the comments, so that others who find this question can see that it was answered and that you were able to solve the problem:
Your original code didn't need a parameter of type oper; a single parameter of type expr was enough.
You don't need a mutable counter variable; recursion will serve you better.
When writing a recursive function like size, you can sometimes get stuck on "Okay, I need to call my size function here... but since I haven't written it yet, how do I know what it's going to return?" The best way to get unstuck is to pretend you've written the function already, so you already know what it's going to return -- and then use that function. And magically, it all works out: by the time you've finished the function, the function that it was calling (itself) is finished too! Funny how that works out. :-)
At one point you had a function that looked like this:
let rec size (e : expr) : int =
match e with
| C i -> 1
| Op1 (o, e1) -> size e + 1
// Rest of function omitted
That was giving you a StackOverflowException, because calling size e inside the Op1 match case was an infinite recursion loop. Hint: trace that call mentally, and think about the steps it would go through. It would check e against Op1, and call size e for the second time. That call would check e against Op1, and call size e for the third time. Would that ever terminate? Would those calls every do anything different from the previous calls, or would they keep looping "forever" until the function stack runs out of space?
Finally, in the cases where you have two or three expr variables, you need to deal with all of them, not just one.
I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + - has their normal meaning
The specific task is to write a function parse :: String -> AST
that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).
I am told that i might need a suitable data type and that data type might need to derive from some other classes.
Following an example output
data AST = Leaf Int | Sum AST AST | Min AST | ...
Further more, i should consider writing a function
tokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished with
ast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.
I should also make a printExpr method to print the result so that
printE: AST -> String
printE(parse "* 5 5") yields either "5*5" or "(5*5)"
and also a function to evaluate the expression
evali :: AST -> Int
I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution!
Thanks in advance!
Edit
I might have been unclear:
I'm wondering how i should go about from having read and tokenized an input string to making an AST.
Parsing tokens into an Abstract Syntax Tree
OK, let's take your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
This is a nice easy grammar, because you can tell from the first token what sort of epression it will be.
(If there was something more complicated, like + coming between numbers, or - being used for subtraction as
well as negation, you'd need the list-of-successes trick, explained in
Functional Parsers.)
Let's have some sample raw input:
rawinput = "- 6 + 45 let x = - 5 in * x x"
Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))",
and I'll assume you tokenised it as
tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]
which fits the grammar, but you might well have got
tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]
which fits your sample AST better. I think it's good practice to name your AST after bits of your grammar,
so I'm going to go ahead and replace
data AST = Leaf Int | Sum AST AST | Min AST | ...
with
data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char
| E_Let {letvar::Char,letequal:: Expr,letin::Expr}
deriving Show
I've named the bits of an E_Let to make it clearer what they represent.
Writing a parsing function
You could use isDigit by adding import Data.Char (isDigit) to help out:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')
| s == "+" = (E_Sum e e',ss'') where
(e,ss') = expr ss
(e',ss'') = expr ss'
-- more cases
Yikes! Too many let clauses obscuring the meaning,
and we'll be writing the same code for E_Prod and very much worse for E_Let.
Let's get this sorted out!
The standard way of dealing with this is to write some combinators;
instead of tiresomely threading the input [String]s through our definition, define ways to
mess with the output of parsers (map) and combine
multiple parsers into one (lift).
Clean up the code 1: map
First we should define pmap, our own equivalent of the map function so we can do pmap E_Neg (expr1 ss)
instead of let (e,ss') = expr1 ss in (E_Neg e,ss')
pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))
nonono, I can't even read that! We need a type synonym:
type Parser a = [String] -> (a,[String])
pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = \ss -> let (a,ss') = p ss
in (f a,ss')
But really this would be better if I did
data Parser a = Par [String] -> (a,[String])
so I could do
instance Functor Parser where
fmap f (Par p) = Par (pmap f p)
I'll leave that for you to figure out if you fancy.
Clean up the code 2: combining two parsers
We also need to deal with the situation when we have two parsers to run,
and we want to combine their results using a function. This is called lifting the function to parsers.
liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = \ss0 -> let
(a,ss1) = p1 ss0
(b,ss2) = p2 ss1
in (f a b,ss2)
or maybe even three parsers:
liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d
I'll let you think how to do that.
In the let statement you'll need liftP5 to parse the sections of a let statement,
lifting a function that ignores the "=" and "in". You could make
equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s
and a couple more to help out with this.
Actually, pmap could also be called liftP1, but map is the traditional name for that sort of thing.
Rewritten with the nice combinators
Now we're ready to clean up expr:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = pmap E_Neg expr ss
| s == "+" = liftP2 E_Sum expr expr ss
-- more cases
That'd all work fine. Really, it's OK. But liftP5 is going to be a bit long, and feels messy.
Taking the cleanup further - the ultra-nice Applicative way
Applicative Functors is the way to go.
Remember I suggested refactoring as
data Parser a = Par [String] -> (a,[String])
so you could make it an instance of Functor? Perhaps you don't want to,
because all you've gained is a new name fmap for the perfectly working pmap and
you have to deal with all those Par constructors cluttering up your code.
Perhaps this will make you reconsider, though; we can import Control.Applicative,
then using the data declaration, we can
define <*>, which sort-of means then and use <$> instead of pmap, with *> meaning
<*>-but-forget-the-result-of-the-left-hand-side so you would write
expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr
Which looks a lot like your grammar definition, so it's easy to write code that works first time.
This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things.
You'd only have to define fmap, <*> and pure, all simple ones, and no long repetative liftP3, liftP4 etc.
Read up about Applicative Functors. They're great.
Applicative in Learn You a Haskell for Great Good!
Applicative in Haskell wikibook
Hints for making Parser applicative: pure doesn't change the list.
<*> is like liftP2, but the function doesn't come from outside, it comes as the output from p1.
To make a start with Haskell itself, I'd recommend Learn You a Haskell for Great Good!, a very well-written and entertaining guide. Real World Haskell is another oft-recommended starting point.
Edit: A more fundamental introduction to parsing is Functional Parsers. I wanted How to replace a failure by a list of successes by PHilip Wadler. Sadly, it doesn't seem to be available online.
To make a start with parsing in Haskell, I think you should first read monadic parsing in Haskell, then maybe this wikibook example, but also then the parsec guide.
Your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
suggests a few abstract data types:
data Dig = Dig_0 | Dig_1 | Dig_2 | Dig_3 | Dig_4 | Dig_5 | Dig_6 | Dig_7 | Dig_8 | Dig_9
data Integ = I_Dig Dig | I_DigInt Dig Integ
data Var = Var_a | Var_b | ... Var_z | Var_A | Var_B | Var_C | ... | Var_Z
data Expr = Expr_I Integ
| Expr_Neg Expr
| Expr_Plus Expr Expr
| Expr_Times Expr Expr Var
| Expr_Var Var
| Expr_let Var Expr Expr
This is inherently a recursively defined syntax tree, no need to make another one.
Sorry about the clunky Dig_ and Integ_ stuff - they have to start with an uppercase.
(Personally I'd want to be converting the Integs to Ints straight away, so would have done newtype Integ = Integ Int, and would probably have done newtype Var = Var Char but that might not suit you.)
Once you've done with the basic ones - dig and var, and neg_, plus_, in_ etcI'd go with the Applicative interface to build them up, so for example your parser expr for Expr would be something like
expr = Expr_I <$> integ
<|> Expr_Neg <$> neg_ *> expr
<|> Expr_Plus <$> plus_ *> expr <*> expr
<|> Expr_Times <$> times_ *> expr <*> expr
<|> Expr_Var <$> var
<|> Expr_let <$> let_ *> var <*> equals_ *> expr <*> in_ *> expr
So nearly all the time, your Haskell code is clean and closely resembles the grammar you were given.
OK, so it seems you're trying to build lots and lots of stuff, and you're not really sure exactly where it's all going. I would suggest that getting the definition for AST right, and then trying to implement evali would be a good start.
The grammer you've listed is interesting... You seem to want to input * 5 5, but output 5*5, whic is an odd choice. Is that really supposed to be a unary minus, not binary? Simlarly, * Expr Expr Var looks like perhaps you might have meant to type * Expr Expr | Var...
Anyway, making some assumptions about what you meant to say, your AST is going to look something like this:
data AST = Leaf Int | Sum AST AST | Minus AST | Var String | Let String AST AST
Now, let us try to do printE. It takes an AST and gives us a string. By the definition above, the AST has to be one of five possible things. You just need to figure out what to print for each one!
printE :: AST -> String
printE (Leaf x ) = show x
printE (Sum x y) = printE x ++ " + " ++ printE y
printE (Minus x ) = "-" ++ printE x
...
show turns an Int into a String. ++ joins two strings together. I'll let you work out the rest of the function. (The tricky thing is if you want it to print brackets to correctly show the order of subexpressions... Since your grammer doesn't mention brackets, I guess no.)
Now, how about evali? Well, this is going to be a similar deal. If the AST is a Leaf x, then x is an Int, and you just return that. If you have, say, Minus x, then x isn't an integer, it's an AST, so you need to turn it into an integer with evali. The function looks something like
evali :: AST -> Int
evali (Leaf x ) = x
evali (Sum x y) = (evali x) + (evali y)
evali (Minus x ) = 0 - (evali x)
...
That works great so far. But wait! It looks like you're supposed to be able to use Let to define new variables, and refer to them later with Var. Well, in that case, you need to store those variables somewhere. And that's going to make the function more complicated.
My recommendation would be to use Data.Map to store a list of variable names and their corresponding values. You'll need to add the variable map to a type signature. You can do it this way:
evali :: AST -> Int
evali ast = evaluate Data.Map.empty ast
evaluate :: Map String Int -> AST -> Int
evaluate m ast =
case ast of
...same as before...
Let var ast1 ast2 -> evaluate (Data.Map.insert var (evaluate m ast1)) ast2
Var var -> m ! var
So evali now just calls evaluate with an empty variable map. When evaluate sees Let, it adds the variable to the map. And when it sees Var, it looks up the name in the map.
As for parsing a string into an AST in the first place, that is an entire other answer again...
I want to write something like this:
type NumExp = Num of float
type Exp =
| Num of float
| Dot of NumExp * NumExp
| Op of string * Exp * Exp
let getValue (Num(n) : NumExp) = n
The compiler complains about a conflict between NumExp and Exp in getValue.
Even the following fails:
let getValue (nn : NumExp) = match nn with | Num(n) -> n
Is there a way to use the same case in both discriminated unions that works with functions? The DU definitions themselves are OK.
I want to use the same case to avoid adding a level of indirection like
type Exp =
| NumExpExp of NumExp
| Dot of NumExp * NumExp
| Op of string * Exp * Exp
in the Exp definition.
I feel I'm missing something very basic here.
The reason I have NumExp is that I want to be able to 'plug' 2 Exps into a Dot (rather than 2 floats) because it makes generating expressions easier, but they can't be any Exp, just numerical.
EDIT: what I really wanted to know is whether the two cases in the two DUs could be treated as the same entity (kind of like Exp "including" NumExp). I realise now Exp.Num and NumExp.Num are completely separate entities. Tomas provides a nice way of discriminating the two cases below.
If you have two discriminated unions with conflicting names of cases, you can use fully qualified name of the discriminated union case:
let getValue (NumExp.Num(n)) = n
A more complete example would look like this:
let rec eval = function
| Exp.Num(f) -> f
| Exp.Dot(NumExp.Num(f1), NumExp.Num(f2)) ->
// whatever 'dot' represents
| Exp.Op(op, e1, e2) ->
// operator
This always uses fully qualified names, which is probably a good idea if the names are simple enough and there are conflicting cases (which could lead to a confusion).
EDIT: Regarding sharing of cases - there is no automatic way of doing that, but you could have a case in your Exp that simply includes values of NumExp. For example like this:
type NumExp =
| Num of float
type Exp =
// first occurrence of NumExp is just a name, but F# allows us to reuse
// the name of the type, so we do that (you could use other name)
| NumExp of NumExp
// other cases
When writing eval function you would then write (note that we no longer have the issue with name clashes, so we don't need fully qualified names):
| NumExp(Num f) -> f
| Op(op, e1, e2) -> // ...
You can use interfaces as a substitute.
This adds a bit of syntactic overhead, but is the best way I've found to do this.
type IExp = interface end
type NumExp =
| Num of float
interface IExp
type Exp =
| Dot of NumExp * NumExp
| Op of string * IExp * IExp
interface IExp
// This function accepts both NumExp and Exp
let f (x:IExp) = match x with
| :? NumExp as e -> match e with
| Num v -> "Num"
| :? Exp as e -> match e with
| Dot (e1,e2) -> "Dot"
| Op (op,e1,e2) -> "Op"
| _ -> invalidArg "x" "Unsupported expression type"
// This function accepts only NumExp
let g = function
| Num v -> "Num"
When that is possible (e.g. using polymorphic variants in OCaml), you can do a lot with it but (sadly) F# does not have this language feature so it is currently incapable of expressing what you want using union types. However, you might consider using OOP instead...
Just an observation: Why do you need the unions constructed this way?
I would have chosen one of two options:
type NumExp = Num of float
type Exp =
| Num of float
| Dot of float * float
| Op of string * Exp * Exp
which is the more simple, or
type NumExp = Num of float
type Exp =
| NumExp
| Dot of float * float
| Op of string * Exp * Exp
In this second case, your function
let getValue (Num(n) : NumExp) = n
works as you have one definition of NumExp now.