Issue with recursion writing a tiny parser in Haskell. Check variables - parsing

I'm still working on a tiny parser for a tiny language defined in a task at school. The parser that generates an AST(Abstract syntax tree) is working. What I want is to check the defined variables, they must be bounded by the let expression. First the method that is defined in the task(suggestion, not needed):
checkVars :: Expr -> Char
data Expr = Var Char | Tall Int | Sum Expr Expr | Mult Expr Expr | Neg Expr | Let Expr Expr Expr
deriving(Eq, Show)
A valid sentence would be "let X be 5 in *(2,X)". X would normally be a Var and 5 is normally an int. And the last can be any part of the dataExpr type. Main point: X is used somewhere in the last expression. The datatype for let is:
Let Expr Expr Expr
Link to the other questions I've asked about this task here just FYI;
First question
Second question
As you see the datatype to the checkVars is Expr, so here is an example of what I would feed to that function:
parseProg "let X be 4 in let Y be *(2 , X) in let Z be +(Y , X) in
+(+(X , Y) , Z)"
Let (Var 'X') (Tall 4) (Let (Var 'Y') (Mult (Tall 2) (Var 'X')) (Let
(Var 'Z') (Sum (Var 'Y') (Var 'X')) (Sum (Sum (Var 'X') (Var 'Y')) (Var
'Z'))))
Just 24
This is an all-inclusive example, the top part is the string/program being parsed. The second part, starting at line 3 (Let) is the AST, input for the checkVars function. And the bottom part "Just 24" is the evaluation. Which I will be back here for more help for.
Note: The point is to spit out the first unbound variable found as an error, and ' ' if everything is fine. Obviously if you want to do this another way you can.

Here's something to think about:
The first field of your Let constructor is an Expr. But can it actually hold anything else than Vars? If not, you should reflect this by making that field's type, say, String and adapting the parser correspondingly. This will make your task a lot easier.
The standard trick to evaluating an expression with let-bindings (which you are doing) is to write a function
type Env = [(String, Int)]
eval :: Expr -> Env -> Int
Note the extra argument for the environment. The environment keeps track of what variables are bound at any given moment to what values. Its position in the type means that you get to decide its value every time you call eval on child expressions. This is crucial! It also means you can have locally declared variables: binding a variable has no effect on its context, only on subexpressions.
Here are the special cases:
In a Var, you want to lookup the variable name in the environment and return the value that is bound to it. (Use the standard Prelude function lookup.)
In a Let, you want to add an extra (varname, value) to the front of the environment list before passing it on to the child expression.
I've left out some details, but this should be enough to get you going a long way. If you get stuck, ask another question. :-)
Oh, and I see you want to return a Maybe value to indicate failure. I suggest you first try without and use error to indicate unbound variables. When you have that version of eval working, adapt it to return Maybe values. The reason for this is that working with Maybe values makes the evaluation quite a bit more complicated.

I would actually try to evaluate the AST. Start by processing (and thus removing) all the Lets. Now, try to evaluate the resulting AST. If you run across a Var then there is an unbound variable.

Related

Parse String to Datatype in Haskell

I'm taking a Haskell course at school, and I have to define a Logical Proposition datatype in Haskell. Everything so far Works fine (definition and functions), and i've declared it as an instance of Ord, Eq and show. The problem comes when I'm required to define a program which interacts with the user: I have to parse the input from the user into my datatype:
type Var = String
data FProp = V Var
| No FProp
| Y FProp FProp
| O FProp FProp
| Si FProp FProp
| Sii FProp FProp
where the formula: ¬q ^ p would be: (Y (No (V "q")) (V "p"))
I've been researching, and found that I can declare my datatype as an instance of Read.
Is this advisable? If it is, can I get some help in order to define the parsing method?
Not a complete answer, since this is a homework problem, but here are some hints.
The other answer suggested getLine followed by splitting at words. It sounds like you instead want something more like a conventional tokenizer, which would let you write things like:
(Y
(No (V q))
(V p))
Here’s one implementation that turns a string into tokens that are either a string of alphanumeric characters or a single, non-alphanumeric printable character. You would need to extend it to support quoted strings:
import Data.Char
type Token = String
tokenize :: String -> [Token]
{- Here, a token is either a string of alphanumeric characters, or else one
- non-spacing printable character, such as "(" or ")".
-}
tokenize [] = []
tokenize (x:xs) | isSpace x = tokenize xs
| not (isPrint x) = error $
"Invalid character " ++ show x ++ " in input."
| not (isAlphaNum x) = [x]:(tokenize xs)
| otherwise = let (token, rest) = span isAlphaNum (x:xs)
in token:(tokenize rest)
It turns the example into ["(","Y","(","No","(","V","q",")",")","(","V","p",")",")"]. Note that you have access to the entire repertoire of Unicode.
The main function that evaluates this interactively might look like:
main = interact ( unlines . map show . map evaluate . parse . tokenize )
Where parse turns a list of tokens into a list of ASTs and evaluate turns an AST into a printable expression.
As for implementing the parser, your language appears to have similar syntax to LISP, which is one of the simplest languages to parse; you don’t even need precedence rules. A recursive-descent parser could do it, and is probably the easiest to implement by hand. You can pattern-match on parse ("(":xs) =, but pattern-matching syntax can also implement lookahead very easily, for example parse ("(":x1:xs) = to look ahead one token.
If you’re calling the parser recursively, you would define a helper function that consumes only a single expression, and that has a type signature like :: [Token] -> (AST, [Token]). This lets you parse the inner expression, check that the next token is ")", and proceed with the parse. However, externally, you’ll want to consume all the tokens and return an AST or a list of them.
The stylish way to write a parser is with monadic parser combinators. (And maybe someone will post an example of one.) The industrial-strength solution would be a library like Parsec, but that’s probably overkill here. Still, parsing is (mostly!) a solved problem, and if you just want to get the assignment done on time, using a library off the shelf is a good idea.
the read part of a REPL interpreter typically looks like this
repl :: ForthState -> IO () -- parser definition
repl state
= do putStr "> " -- puts a > character to indicate it's waiting for input
input <- getLine -- this is what you're looking for, to read a line.
if input == "quit" -- allows user to quit the interpreter
then do putStrLn "Bye!"
return ()
else let (is, cs, d, output) = eval (words input) state -- your grammar definition is somewhere down the chain when eval is called on input
in do mapM_ putStrLn output
repl (is, cs, d, [])
main = do putStrLn "Welcome to your very own interpreter!"
repl initialForthState -- runs the parser, starting with read
your eval method will have various loops, stack manipulations, conditionals, etc to actually figure out what the user inputted. hope this helps you with at least the reading input part.

Parse Error: (incorrect indentation or misplaced bracket)

I'm starting out to learn Haskell. Even though I'm a dunce extraordinaire, I am intent on making this work. The error I received is listed as the title. This is the code that I wrote to try to implement the behavior of replicating a list (n) times and concatenating its new length as a new list. Now I have a basic understanding of how parsing works in Haskell, below my original code I will give example of some modified code to see if my understanding on parsing is adequate. My question for now is how I can properly indent or structure my block in order to not receive this error (is that specific enough :O) -- is there a piece of information I'm missing when it comes to creating instances and formatting? PLEASE DO NOT TELL ME OR OFFER SUGGESTIONS IF YOU NOTICE THAT MY CURRENT INSTANCE OR MAIN FUNCTION ARE SYNTACTICALLY WRONG. I want to figure it out and will deal with that GHC error when I get to it. (I hope that's the proper way to learn). BUT if I could ask for anyone's help in getting past this first obstacle in understanding proper formatting, I'd be grateful.
module Main where
import Data.List
n :: Int
x :: [Char]
instance Data stutter n x where
x = []
n = replicate >>= x : (n:xs)
stutter >>= main = concat [x:xs]
let stutter 6 "Iwannabehere" -- <-- parse error occurs here!!!
--Modified code with appropriate brackets, at least where I think they go.
module Main where
import Data.List
n :: Int
x :: [Char]
instance Data stutter n x where{
;x = []
;n = replicate >>= x : (n:xs)
;stutter >>= main = concat [x:xs]
;
};let stutter 6 "Iwannabehere" -- there should be no bracket of any kind at the end of this
I placed the 'let' expression on the outside of the block, I don't believe it goes inside and I also receive a parsing error if I do that. Not correct but I thought I'd ask anyway.
I'm not sure what the instance Data stutter n x is supposed to be, the instance XYZ where syntax is used solely for typeclasses, but you have a couple syntax errors here.
First of all, while GHC says that the error is on let stutter 6 "Iwannabehere", your first error occurs before that with stutter >>= main = concat [x:xs]. A single = sign is reserved for assignments, which are merely definitions. You can have assignments at the top level, inside a where block, or inside a let block (the where includes typeclass instance definitions). You can't have an assignment be part of an expression like x >>= y = z.
Your next syntax error is the let itself. let blocks can not appear at the top level, they only appear within another definition. You use let in GHCi but the reasons for that are outside the scope of this answer. Suffice to say that entering expression in GHCi is not equivalent to the top level of a source file.
Next, if you were to use a let block somewhere, it can only contain definitions. The syntax looks more like
let <name> [<args>] = <definition>
[<name> [<args>] = <definition>]
in <expression>
And this whole block makes an expression. For example, you could write
def f(x, y, z):
w = x + y + z
u = x - y - z
return w * u
in Python, and this would be equivalent to the Haskell function definition
f x y z = let w = x + y + z
u = x - y - z
in w * u
It only defines local variables. There is another form when you're using it inside do blocks where you can exclude the in <expression> part, such as
main = do
name <- getLine
let message = if length name > 5 then "short name" else "long name"
goodbye n = putStrLn ("Goodbye, " ++ n)
putStrLn message
goodbye name
Note that there is no need to use in here. You can if you want, it just means you have to start a new do block:
main = do
name <- getLine
let message = ...
goodbye n = ...
in do
putStrLn message
goodbye name
And this isn't as pretty.
Hopefully this points you more towards correct syntax, but it looks like you have some misunderstandings about how Haskell works. Have you looked at Learn You a Haskell? It's a pretty gentle and fun introduction to the language that can really help you learn the syntax and core ideas.
Your parse error is from the let keyword. Remove it and no error related to that will occur. let x = y is only relevant in GHCi and do-blocks, neither of which is relevant at this point. Essentially, just replace it with this line:
theWordIGet = stutter 6 "Iwannabehere"
Secondly, instance keyword in Haskell has absolutley nothing to do with what you want to do at this stage. This is not how Haskell functions are defined, which is what I'm guessing you want to do. This is what you're wanting to do to create a stutter function, assuming it simply repeats a string n times.
stutter :: Int -> String -> String
stutter n x = concat (replicate n x)
You'll also want to remove the type declarations for the (out-of-scope) values n and x: they're not objects, they're arguments for a function, which has its own signature determining the types of n and x within a function call.
Lastly, I imagine you will want to print the value of stutter 6 "Iwannabehere" when the program is executed. To do that, just add this:
main :: IO ()
main = print (stutter 6 "Iwannabehere")
In conclusion, I implore you to start from scratch and read 'Learn You a Haskell' online here, because you're going off in entirely the wrong direction - the program you've quoted is a jumble of expressions that could have a meaning, but are in the wrong place entirely. The book will show you the syntax of Haskell much better that I can write about in this one answer, and will explain fully how to make your program behave in the way you expect.

try function in parsing lambda expressions

I'm totally new to Haskell and trying to implement a "Lambda calculus" parser, that will be used to read the input to a lambda reducer .. It's required to parse bindings first "identifier = expression;" from a text file, then at the end there's an expression alone ..
till now it can parse bindings only, and displays errors when encountering an expression alone .. when I try to use the try or option functions, it gives a type mismatch error:
Couldn't match type `[Expr]'
with `Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]]'
Expected type: Text.Parsec.Prim.ParsecT
s0 u0 m0 (Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]])
Actual type: Text.Parsec.Prim.ParsecT s0 u0 m0 [Expr]
In the second argument of `option', namely `bindings'
bindings weren't supposed to return anything, but I tried to add a return statement and it also returned a type mismatch error:
Couldn't match type `[Expr]' with `Expr'
Expected type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [Expr]
Actual type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [[Expr]]
In the second argument of `(<|>)', namely `expressions'
Don't use <|> if you want to allow both
Your program parser does its main work with
program = do
spaces
try bindings <|> expressions
spaces >> eof
This <|> is choice - it does bindings if it can, and if that fails, expressions, which isn't what you want. You want zero or more bindings, followed by expressions, so let's make it do that.
Sadly, even when this works, the last line of your parser is eof and
First, let's allow zero bindings, since they're optional, then let's get both the bindings and the expressions:
bindings = many binding
program = do
spaces
bs <- bindings
es <- expressions
spaces >> eof
return (bs,es)
This error would be easier to find with plenty more <?> "binding" type hints so you can see more clearly what was expected.
endBy doesn't need many
The error message you have stems from the line
expressions = many (endBy expression eol)
which should be
expressions :: Parser [Expr]
expressions = endBy expression eol
endBy works like sepBy - you don't need to use many on it because it already parses many.
This error would have been easier to find with a stronger data type tree, so:
Use try to deal with common prefixes
One of the hard-to-debug problems you've had is when you get the error expecting space or "=" whilst parsing an expression. If we think about that, the only place we expect = is in a binding, so it must be part way through parsing a binding when we've given it an expression. This only happens if our expression starts with an identifier, just like a binding does.
binding sees the first identifier and says "It's OK guys, I've got this" but then finds no = and gives you an error, where we wanted it to backtrack and let expression have a go. The key point is we've already used the identifier input, and we want to unuse it. try is right for that.
Encase your binding parser with try so if it fails, we'll go back to the start of the line and hand over to expression.
binding = try (do
(Var id) <- identifier
_ <- char '='
spaces
exp <- expression
spaces
eol <?> "end of line"
return $ Eq id exp
<?> "binding")
It's important that as far as possible each parser starts with matching something unique to avoid this problem. (try is backtracking, hence inefficient, so should be avoided if possible.)
In particular, avoid starting parsers with spaces, but instead make sure you finish them all with spaces. Your main program can start with spaces if you like, since it's the only alternative.
Use types for most productions - better structure & readability
My first piece of general advice is that you could do with a more fine-grained data type, and should annotate your parsers with their type. At the moment, everything's wrapped up in Expr, which means you can only get error messages about whether you have an Expr or a [Expr]. The fact that you had to add Eq to Expr is a sign you're pushing the type too far.
Usually it's worth making a data type for quite a lot of the productions, and if you import Control.Applicative hiding ((<|>),(<$>),many) Control.Applicative you can use <$> and <*> so that the production, the datatype and the parser are all the same structure:
--<program> ::= <spaces> [<bindings>] <expressions>
data Program = Prog [Binding] [Expr]
program = spaces >> Prog <$> bindings <*> expressions
-- <expression> ::= <abstraction> | factors
data Expression = Ab Abstraction | Fa [Factor]
expression = Ab <$> abstraction <|> Fa <$> factors <?> "expression"
Don't do this with letters for example, but for important things. What counts as important things is a matter of judgement, but I'd start with Identifiers. (You can use <* or *> to not include syntax like = in the results.)
Amended code:
Before refactoring types and using Applicative here
And afterwards here

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Can an interpreter be implemented with a symbol table?

Often I hear that using a symbol table optimizes look ups of symbols in a programming language. Currently, my language is implemented only as an interpreter, not as a compiler. I do not yet want to allocate the time to build a compiler, so I'm attempting to optimize the interpreter. The language is based on Scheme semantics and syntax for the most part, and is statically-scoped. I use the AST for executing code at run-time (in my interpreter, implemented as discriminated unions just like the AST in Write Yourself a Scheme in 48 Hours.
Unfortunately, symbol look-up in my interpreter is slow due to the use of an F# Map to contain and look up symbols by name. (Well, in truth, it uses a Trie, but the performance is similarly problematic). I would like to instead use a symbol tree to achieve faster symbol lookup. However, I don't know if or how one can implement symbols tables in an interpreter. I hear about them only in the context of a compiler.
Is this possible? If the implementation strategy or performance differs from a symbol table in a compiler, could you describe the differences? Finally, is there an existing reference implementation of a symbol tree in an interpreter I might look at?
Thank you!
A symbol table associates some information with every symbol. In an interpreter, you would perhaps associate values with symbols. Map is one implementation particularly suitable for functional interpreters.
If you want to optimize your interpreter, get rid of the need for a symbol table at runtime. One way to to go is De Bruijn idexing.
There is also nice literature on mechanically deriving optimized interpreters, VMs and compilers from a functional interpreter, for example:
http://www.brics.dk/RS/03/14/BRICS-RS-03-14.pdf
For a simple example, consider lambda calculus with constants encoded with De Bruijn indices. Notice that the evaluator gets by without a symbol table, because it can use integers for lookup.
type exp =
| App of exp * exp
| Const of int
| Fn of exp
| Var of int
type value =
| Closure of exp * env
| Number of int
and env = value []
let lookup env i = Array.get env i
let extend value env = Array.append [| value |] env
let empty () : env = Array.empty
let eval exp =
let rec eval env exp =
match exp with
| App (f, x) ->
match eval env f with
| Closure (bodyF, envF) ->
let vx = eval env x
eval (extend vx envF) bodyF
| _ -> failwith "?"
| Const x -> Number x
| Fn e -> Closure (e, env)
| Var x -> lookup env x
eval (empty ()) exp

Resources