I'm totally new to Haskell and trying to implement a "Lambda calculus" parser, that will be used to read the input to a lambda reducer .. It's required to parse bindings first "identifier = expression;" from a text file, then at the end there's an expression alone ..
till now it can parse bindings only, and displays errors when encountering an expression alone .. when I try to use the try or option functions, it gives a type mismatch error:
Couldn't match type `[Expr]'
with `Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]]'
Expected type: Text.Parsec.Prim.ParsecT
s0 u0 m0 (Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]])
Actual type: Text.Parsec.Prim.ParsecT s0 u0 m0 [Expr]
In the second argument of `option', namely `bindings'
bindings weren't supposed to return anything, but I tried to add a return statement and it also returned a type mismatch error:
Couldn't match type `[Expr]' with `Expr'
Expected type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [Expr]
Actual type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [[Expr]]
In the second argument of `(<|>)', namely `expressions'
Don't use <|> if you want to allow both
Your program parser does its main work with
program = do
spaces
try bindings <|> expressions
spaces >> eof
This <|> is choice - it does bindings if it can, and if that fails, expressions, which isn't what you want. You want zero or more bindings, followed by expressions, so let's make it do that.
Sadly, even when this works, the last line of your parser is eof and
First, let's allow zero bindings, since they're optional, then let's get both the bindings and the expressions:
bindings = many binding
program = do
spaces
bs <- bindings
es <- expressions
spaces >> eof
return (bs,es)
This error would be easier to find with plenty more <?> "binding" type hints so you can see more clearly what was expected.
endBy doesn't need many
The error message you have stems from the line
expressions = many (endBy expression eol)
which should be
expressions :: Parser [Expr]
expressions = endBy expression eol
endBy works like sepBy - you don't need to use many on it because it already parses many.
This error would have been easier to find with a stronger data type tree, so:
Use try to deal with common prefixes
One of the hard-to-debug problems you've had is when you get the error expecting space or "=" whilst parsing an expression. If we think about that, the only place we expect = is in a binding, so it must be part way through parsing a binding when we've given it an expression. This only happens if our expression starts with an identifier, just like a binding does.
binding sees the first identifier and says "It's OK guys, I've got this" but then finds no = and gives you an error, where we wanted it to backtrack and let expression have a go. The key point is we've already used the identifier input, and we want to unuse it. try is right for that.
Encase your binding parser with try so if it fails, we'll go back to the start of the line and hand over to expression.
binding = try (do
(Var id) <- identifier
_ <- char '='
spaces
exp <- expression
spaces
eol <?> "end of line"
return $ Eq id exp
<?> "binding")
It's important that as far as possible each parser starts with matching something unique to avoid this problem. (try is backtracking, hence inefficient, so should be avoided if possible.)
In particular, avoid starting parsers with spaces, but instead make sure you finish them all with spaces. Your main program can start with spaces if you like, since it's the only alternative.
Use types for most productions - better structure & readability
My first piece of general advice is that you could do with a more fine-grained data type, and should annotate your parsers with their type. At the moment, everything's wrapped up in Expr, which means you can only get error messages about whether you have an Expr or a [Expr]. The fact that you had to add Eq to Expr is a sign you're pushing the type too far.
Usually it's worth making a data type for quite a lot of the productions, and if you import Control.Applicative hiding ((<|>),(<$>),many) Control.Applicative you can use <$> and <*> so that the production, the datatype and the parser are all the same structure:
--<program> ::= <spaces> [<bindings>] <expressions>
data Program = Prog [Binding] [Expr]
program = spaces >> Prog <$> bindings <*> expressions
-- <expression> ::= <abstraction> | factors
data Expression = Ab Abstraction | Fa [Factor]
expression = Ab <$> abstraction <|> Fa <$> factors <?> "expression"
Don't do this with letters for example, but for important things. What counts as important things is a matter of judgement, but I'd start with Identifiers. (You can use <* or *> to not include syntax like = in the results.)
Amended code:
Before refactoring types and using Applicative here
And afterwards here
Related
I am trying to wrap my head around writing parser using parsec in Haskell, in particular how backtracking works.
Take the following simple parser:
import Text.Parsec
type Parser = Parsec String () String
parseConst :: Parser
parseConst = do {
x <- many digit;
return $ read x
}
parseAdd :: Parser
parseAdd = do {
l <- parseExp;
char '+';
r <- parseExp;
return $ l <> "+" <> r
}
parseExp :: Parser
parseExp = try parseConst <|> parseAdd
pp :: Parser
pp = parseExp <* eof
test = parse pp "" "1+1"
test has value
Left (line 1, column 2):
unexpected '+'
expecting digit or end of input
In my mind this should succeed since I used the try combinator on parseConst in the definition of parseExp.
What am I missing? I am also interrested in pointers for how to debug this in my own, I tried using parserTraced which just allowed me to conclude that it indeed wasn't backtracking.
PS.
I know this is an awful way to write an expression parser, but I'd like to understand why it doesn't work.
There are a lot of problems here.
First, parseConst can never work right. The type says it must produce a String, so read :: String -> String. That particular Read instance requires the input be a quoted string, so being passed 0 or more digit characters to read is always going to result in a call to error if you try to evaluate the value it produces.
Second, parseConst can succeed on matching zero characters. I think you probably wanted some instead of many. That will make it actually fail if it encounters input that doesn't start with a digit.
Third, (<|>) doesn't do what you think. You might think that (a <* c) <|> (b <* c) is interchangeable with (a <|> b) <* c, but it isn't. There is no way to throw try in and make it the same, either. The problem is that (<|>) commits to whichever branch succeed, if one does. In (a <|> b) <* c, if a matches, there's no way later to backtrack and try b there. Doesn't matter how you lob try around, it can't undo the fact that (<|>) committed to a. In contrast, (a <* c) <|> (b <* c) doesn't commit until both a and c or b and c match the input.
This is the situation you're encountering. You have (try parseConst <|> parseAdd) <* eof, after a bit of inlining. Since parseConst will always succeed (see the second issue), parseAdd will never get tried, even if the eof fails. So after parseConst consumes zero or more leading digits, the parse will fail unless that's the end of the input. Working around this essentially requires carefully planning your grammar such that any use of (<|>) is safe to commit locally. That is, the contents of each branch must not overlap in a way that is disambiguated only by later portions of the grammar.
Note that this unpleasant behavior with (<|>) is how the parsec family of libraries work, but not how all parser libraries in Haskell work. Other libraries work without the left bias or commit behavior the parsec family have chosen.
I am trying to parse a String using parsec in Haskell, however every attempt throws another type of error.
import Text.ParserCombinators.Parsec
csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'
parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input
This code, when run through stack ghci produces an error saying "non type-variable argument in the constraint: Text.Parsec.Prim.Stream"
Basically, I am wondering what the most straight forward way to parse a String into tokens based on commas is in Haskell. It seems like a very straightforward concept and I assumed that it would be a great learning experience, but so far it has produced nothing but errors.
The error I see when entering char '\n' in ghci is:
<interactive>:4:1: error:
• Non type-variable argument
in the constraint: Text.Parsec.Prim.Stream s m Char
(Use FlexibleContexts to permit this)
• When checking the inferred type
it :: forall s (m :: * -> *) u.
Text.Parsec.Prim.Stream s m Char =>
Text.Parsec.Prim.ParsecT s u m Char
The advice about FlexibleContexts is accurate. You can turn on FlexibleContexts like so:
*Main> :set -XFlexibleContexts
Unfortunately, the next error is • No instance for (Show (Text.Parsec.Prim.ParsecT s0 u0 m0 Char)) (basically, we can't print a function) so you'll still need to apply the parser to some input to actually run it.
Like commenters, I find that parseCSV can be used without any language extensions.
There are a few things going on here:
In the context of the whole program, the type of eol is constrained by the type signature on parseCSV. That doesn't happen when typing eol = char '\n' into GHCi.
GHCi's :t is permissive - it's willing to print some types that use language features that aren't turned on.
GHC has grown by adding a large number of language extensions, which can be turned on by the programmer on a per-module basis. Some are widely used by production-ready libraries, others are new & experimental.
I'm taking a Haskell course at school, and I have to define a Logical Proposition datatype in Haskell. Everything so far Works fine (definition and functions), and i've declared it as an instance of Ord, Eq and show. The problem comes when I'm required to define a program which interacts with the user: I have to parse the input from the user into my datatype:
type Var = String
data FProp = V Var
| No FProp
| Y FProp FProp
| O FProp FProp
| Si FProp FProp
| Sii FProp FProp
where the formula: ¬q ^ p would be: (Y (No (V "q")) (V "p"))
I've been researching, and found that I can declare my datatype as an instance of Read.
Is this advisable? If it is, can I get some help in order to define the parsing method?
Not a complete answer, since this is a homework problem, but here are some hints.
The other answer suggested getLine followed by splitting at words. It sounds like you instead want something more like a conventional tokenizer, which would let you write things like:
(Y
(No (V q))
(V p))
Here’s one implementation that turns a string into tokens that are either a string of alphanumeric characters or a single, non-alphanumeric printable character. You would need to extend it to support quoted strings:
import Data.Char
type Token = String
tokenize :: String -> [Token]
{- Here, a token is either a string of alphanumeric characters, or else one
- non-spacing printable character, such as "(" or ")".
-}
tokenize [] = []
tokenize (x:xs) | isSpace x = tokenize xs
| not (isPrint x) = error $
"Invalid character " ++ show x ++ " in input."
| not (isAlphaNum x) = [x]:(tokenize xs)
| otherwise = let (token, rest) = span isAlphaNum (x:xs)
in token:(tokenize rest)
It turns the example into ["(","Y","(","No","(","V","q",")",")","(","V","p",")",")"]. Note that you have access to the entire repertoire of Unicode.
The main function that evaluates this interactively might look like:
main = interact ( unlines . map show . map evaluate . parse . tokenize )
Where parse turns a list of tokens into a list of ASTs and evaluate turns an AST into a printable expression.
As for implementing the parser, your language appears to have similar syntax to LISP, which is one of the simplest languages to parse; you don’t even need precedence rules. A recursive-descent parser could do it, and is probably the easiest to implement by hand. You can pattern-match on parse ("(":xs) =, but pattern-matching syntax can also implement lookahead very easily, for example parse ("(":x1:xs) = to look ahead one token.
If you’re calling the parser recursively, you would define a helper function that consumes only a single expression, and that has a type signature like :: [Token] -> (AST, [Token]). This lets you parse the inner expression, check that the next token is ")", and proceed with the parse. However, externally, you’ll want to consume all the tokens and return an AST or a list of them.
The stylish way to write a parser is with monadic parser combinators. (And maybe someone will post an example of one.) The industrial-strength solution would be a library like Parsec, but that’s probably overkill here. Still, parsing is (mostly!) a solved problem, and if you just want to get the assignment done on time, using a library off the shelf is a good idea.
the read part of a REPL interpreter typically looks like this
repl :: ForthState -> IO () -- parser definition
repl state
= do putStr "> " -- puts a > character to indicate it's waiting for input
input <- getLine -- this is what you're looking for, to read a line.
if input == "quit" -- allows user to quit the interpreter
then do putStrLn "Bye!"
return ()
else let (is, cs, d, output) = eval (words input) state -- your grammar definition is somewhere down the chain when eval is called on input
in do mapM_ putStrLn output
repl (is, cs, d, [])
main = do putStrLn "Welcome to your very own interpreter!"
repl initialForthState -- runs the parser, starting with read
your eval method will have various loops, stack manipulations, conditionals, etc to actually figure out what the user inputted. hope this helps you with at least the reading input part.
The example code below appears to work nicely:
open FParsec
let capitalized : Parser<unit,unit> =(asciiUpper >>. many asciiLower >>. eof)
let inverted : Parser<unit,unit> =(asciiLower >>. many asciiUpper >>. eof)
let capsOrInvert =choice [capitalized;inverted]
You can then do:
run capsOrInvert "Dog";;
run capsOrInvert "dOG";;
and get a success or:
run capsOrInvert "dog";;
and get a failure.
Now that I have a ParserResult, how do I do things with it? For example, print the string backwards?
There are several notable issues with your code.
First off, as noticed in #scrwtp's answer, your parser returns unit. Here's why: operator (>>.) returns only the result returned by the right inner parser. On the other hand, (.>>) would return the result of a left parser, while (.>>.) would return a tuple of both left and right ones.
So, parser1 >>. parser2 >>. eof is essentially (parser1 >>. parser2) >>. eof.
The code in parens completely ignores the result of parser1, and the second (>>.) then ignores the entire result of the parser in parens. Finally, eof returns unit, and this value is being returned.
You may need some meaningful data returned instead, e.g. the parsed string. The easiest way is:
let capitalized = (asciiUpper .>>. many asciiLower .>> eof)
Mind the operators.
The code for inverted can be done in a similar manner.
This parser would be of type Parser<(char * char list), unit>, a tuple of first character and all the remaining ones, so you may need to merge them back. There are several ways to do that, here's one:
let mymerge (c1: char, cs: char list) = c1 :: cs // a simple cons
let pCapitalized = capitalized >>= mymerge
The beauty of this code is that your mymerge is a normal function, working with normal char's, it knows nothing about parsers or so. It just works with the data, and (>>=) operator does the rest.
Note, pCapitalized is also a parser, but it returns a single char list.
Nothing stops you from applying further transitions. As you mentioned printing the string backwards:
let pCapitalizedAndReversed =
capitalized
>>= mymerge
>>= List.rev
I have written the code in this way for purpose. In different lines you see a gradual transition of your domain data, still within the paradigm of Parser. This is an important consideration, because any subsequent transition may "decide" that the data is bad for some reason and raise a parsing exception, for example. Or, alternatively, it may be merged with other parser.
As soon as your domain data (a parsed-out word) is complete, you extract the result as mentioned in another answer.
A minor note. choice is superfluous for only two parsers. Use (<|>) instead. From experience, careful choosing parser combinators is important because a wrong choice deep inside your core parser logic can easily make your parsers dramatically slow.
See FParsec Primitives for further details.
ParserResult is a discriminated union. You simply match the Success and Failure cases.
let r = run capsOrInvert "Dog"
match r with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
But this is probably not what you find tricky about your situation.
The thing about your Parser<unit, unit> type is that the parsed value is of type unit (the first type argument to Parser). What this means is that this parser doesn't really produce any sensible output for you to use - it can only tell you whether it can parse a string (in which case you get back a Success ((), _, _) - carrying the single value of type unit) or not.
What do you expect to get out of this parser?
Edit: This sounds close to what you want, or at least you should be able to pick up some pointers from it. capitalized accepts capitalized strings, inverted accepts capitalized strings that have been reversed and reverses them as part of the parser logic.
let reverse (s: string) =
System.String(Array.rev (Array.ofSeq s))
let capitalized : Parser<string,unit> =
(asciiUpper .>>. manyChars asciiLower)
|>> fun (upper, lower) -> string upper + lower
let inverted : Parser<string,unit> =
(manyChars asciiLower .>>. asciiUpper)
|>> fun (lower, upper) -> reverse (lower + string upper)
let capsOrInvert = choice [capitalized;inverted]
run capsOrInvert "Dog"
run capsOrInvert "doG"
run capsOrInvert "dog"
Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.