I am new to both Haskell and Parsec. In an effort to learn more about the language and that library in particular I am trying to create a parser that can parse Lua saved variable files. In these files variables can take the following forms:
varname = value
varname = {value, value,...}
varname = {{value, value},{value,value,...}}
I've created parsers for each of these types but when I string them together with the choice <|> operator I get a type error.
Couldn't match expected type `[Char]' against inferred type `Char'
Expected type: GenParser Char st [[[Char]]]
Inferred type: GenParser Char st [[Char]]
In the first argument of `try', namely `lList'
In the first argument of `(<|>)', namely `try lList'
My assumption is (although I can't find it in the documentation) that each parser passed to the choice operator must return the same type.
Here's the code in question:
data Variable = LuaString ([Char], [Char])
| LuaList ([Char], [[Char]])
| NestedLuaList ([Char], [[[Char]]])
deriving (Show)
main:: IO()
main = do
case (parse varName "" "variable = {{1234,\"Josh\"},{123,222}}") of
Left err -> print err
Right xs -> print xs
varName :: GenParser Char st Variable
varName = do{
vName <- (many letter);
eq <- string " = ";
vCon <- try nestList
<|> try lList
<|> varContent;
return (vName, vCon)}
varContent :: GenParser Char st [Char]
varContent = quotedString
<|> many1 letter
<|> many1 digit
quotedString :: GenParser Char st [Char]
quotedString = do{
s1 <- string "\"";
s2 <- varContent;
s3 <- string "\"";
return (s1++s2++s3)}
lList :: GenParser Char st [[Char]]
lList = between (string "{") (string "}") (sepBy varContent (string ","))
nestList :: GenParser Char st [[[Char]]]
nestList = between (string "{") (string "}") (sepBy lList (string ","))
That's correct.
(<|>) :: (Alternative f) => f a -> f a -> f a
Notice how both arguments are exactly the same type.
I don't exactly understand your Variable data type. This is the way I would do it:
data LuaValue = LuaString String | LuaList [LuaValue]
data Binding = Binding String LuaValue
This allows values to be arbitrarily nested, not just nested two levels deep like yours has. Then write:
luaValue :: GenParser Char st LuaValue
luaValue = (LuaString <$> identifier)
<|> (LuaList <$> between (string "{") (string "}") (sepBy (string ",") luaValue))
This is the parser for luaValue. Then you just need to write:
binding :: GenParser Char st Binding
content :: GenParser Char st [Binding]
And you'll have it. Using a data type that accurately represents what is possible is important.
Indeed, parsers passed to the choice operator must have equal types. You can tell by the type of the choice operator:
(<|>) :: GenParser tok st a -> GenParser tok st a -> GenParser tok st a
This says that it will happily combine two parsers as long as their token types, state types and result types are the same.
So how do we make sure those parsers you're trying to combine have the same result type? Well, you already have a datatype Variable that captures the different forms of variables that can appear in Lua, so what we need to do is not return String, [String] or [[String]] but just Variables.
But when we try that we run into a problem. We can't let nestList etc. return Variables yet because the constructors of Variable require variable names and we don't know those yet at that point. There are workarounds for this (such as return a function String -> Variable that still expects that variable name) but there is a better solution: separate the variable name from the different kinds of values that a variable can have.
data Variable = Variable String Value
deriving Show
data Value = LuaString String
| LuaList [Value]
deriving (Show)
Note that I've removed the NestedLuaList constructor. I've changed LuaList to accept a list of Values rather than Strings, so a nested list can now be expressed as a LuaList of LuaLists. This allows lists to be nested arbitrarily deep rather than just two levels as in your example. I don't know if this is allowed in Lua but it made writing the parsers easier. :-)
Now we can let lList and nestList return Values:
lList :: GenParser Char st Value
lList = do
ss <- between (string "{") (string "}") (sepBy varContent (string ","))
return (LuaList (map LuaString ss))
nestList :: GenParser Char st Value
nestList = do
vs <- between (string "{") (string "}") (sepBy lList (string ","))
return (LuaList vs)
And varName, which I've renamed variable here, now returns a Variable:
variable :: GenParser Char st Variable
variable = do
vName <- (many letter)
eq <- string " = "
vCon <- try nestList
<|> try lList
<|> (do v <- varContent; return (LuaString v))
return (Variable vName vCon)
I think you'll find that when you run your parser on some input there are still some problems, but you're already a lot closer to the solution now than before.
I hope this helps!
Related
I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)
I have been reading a tutorial about parser combinators and I came across a function which I would like a bit of help in trying to understand.
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = item `bind` \c ->
if p c
then unit c
else (Parser (\cs -> []))
char :: Char -> Parser Char
char c = satisfy (c ==)
natural :: Parser Integer
natural = read <$> some (satisfy isDigit)
string :: String -> Parser String
string [] = return []
string (c:cs) = do { char c; string cs; return (c:cs)}
My question is how does the string function work or rather how does it terminate, say i did something like:
let while_parser = string "while"
and then i used it to parse a string say for example
parse while_parser "while if" , it will correctly parse me the "while".
however if i try something like parse while_parser "test it will return [].
My question is how does it fail? what happens when char c returns an empty list?
Let's say your Parser is defined like this:
newtype Parser a = Parser { runParser :: String -> [(a,String)] }
Then your Monad instance would be defined something like this:
instance Monad Parser where
return x = Parser $ \input -> [(x, input)]
p >>= f = Parser $ \input -> concatMap (\(x,s) -> runParser (f x) s) (runParser p input)
You're wondering what happens when char c fails in this line of code:
string (c:cs) = do { char c; string cs; return (c:cs) }
First, let's desugar it:
string (c:cs) = char c >>= \_ -> string cs >>= \_ -> return (c:cs)
Now the part of interest is char c >>= \_ -> string cs. From the definition of char and subsequently the definition of satisfy we see that ultimately runParser (char c) input will evaluate to [] when char c fails. Look at the definition of >>= when p is char c. concatMap won't have any work to do because the list will be empty! Thus any calls to >>= from then on will just encounter an empty list and pass it along.
One of the wonderful things about referential transparency is that you can write down your expression and evaluate it by substituting definitions and doing the function applications by hand.
I want to parse Float values from a file where they are stored using comma as the decimal separator. Thus i need a function myParse :: String -> Float such that, for instance, myParse "23,46" == 23.46.
I have some ideas about how to do this, but they all seem overcomplicated, for example:
replace , with a . in the string and use read; or
follow this FP Complete blogpost (entitled Parsing Floats With Parsec), and challenge the curse of the monomorphism restriction.
Is there a simpler way, or do I really need to use a parsing library? In the second case, could you please paste some suggestions in order to get me started? The monomorphism restriction scares me, and I believe that there has to be a way to do this without using language extensions.
Replacing , by . and then call read is straightforward enough; you just need to remember to use your own specialized function instead of plain old read:
readFloatWithComma :: String -> Float
readFloatWithComma = read . sanitize
where
sanitize = map (\c -> if c == ',' then '.' else c)
In GHCi:
λ> readFloatWithComma "23,46"
23.46
Regarding the parsec approach, despite what the article you link to suggest, the monomorphism restriction needs not be a worry, as long as you have type signatures for all your top-level bindings. In particular, the following code doesn't need any language extensions to compile properly (at least, in GHC 7.10.1):
import Text.Parsec
import Text.Parsec.String ( Parser )
import Control.Applicative hiding ( (<|>) )
infixr 5 <++>
(<++>) :: Applicative f => f [a] -> f [a] -> f [a]
a <++> b = (++) <$> a <*> b
infixr 5 <:>
(<:>) :: Applicative f => f a -> f [a] -> f [a]
a <:> b = (:) <$> a <*> b
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
integer :: Parser String
integer = plus <|> minus <|> number
float :: Parser Float
float = fmap rd $ integer <++> decimal <++> exponent
where rd = read :: String -> Float
decimal = option "" $ ('.' <$ char ',') <:> number
exponent = option "" $ oneOf "eE" <:> integer
In GHCi:
λ> parseTest float "23,46"
23.46
I'm working on a Happy math expressions and variables parser. The problem is that I don't know how to save the value for a variable and use it later. Any ideas?
This is how I recognize expressions and variables assignment:
genExp : exp { $1 }
| variable '=' exp { //here I want to save the value of the variable; something like this: insert variables $1 $3, where 'variables' is a Data.Map }
A expression can contain a variable. For example:
a = 2 + 1
a + 2 (now the parser must print 5)
I need to save the value of the variable 'a' when the parser is parsing the line 'a = 2 + 1' and to get the value of the variable 'a' when the parser is parsing the line 'a + 2'
What you want to do is to keep track of the value of variables during evaluation of expressions, not during parsing. Let's assume you parse your expressions into the following types:
data Expr = Literal Int | Variable Var | Assign Var Expr | Add Expr Expr | ...
newtype Var = Var String deriving (Ord, Eq, Show)
Then you could simply pass a Map around your evaluation function with the current value of all variables:
import qualified Data.Map as M
import Control.Monad.State
data Expr = Literal Int | Variable Var | Assign Var Expr | Add Expr Expr
newtype Var = Var String deriving (Ord, Eq, Show)
-- Each Expr corresponds to a single line in your language, so
-- a = 2+1
-- a + 2
-- corresponds to
-- [Assign (Var "a") (Add (Literal 2) (Literal 1)),
-- Add (Variable (Var "a")) (Literal 2)]
eval :: [Expr] -> Int
eval es = last $ evalState (mapM eval' es) M.empty -- M.empty :: M.Map Var Int
where
eval' (Literal n) = return n
eval' (Variable v) = do
vs <- get
case M.lookup v vs of
Just x -> return x
_ -> error $ "variable " ++ show v ++ " is undefined!"
eval' (Assign v ex) = do
x <- eval' ex
modify (M.insert v x)
return x
eval' (Add a b) = do
x <- eval' a
y <- eval' b
return (x+y)
Of course, there's nothing to prevent you from evaluating expressions as you parse them, eliminating the need for an abstract syntax tree such as this. The general idea is the same there; you'll need to keep some state with you during the entire parsing, that keeps track of the current value of all your variables.
I try to do this:
Parse a Text in the form:
Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}
into a list of some data structure:
[Inside "Some Text ",Outside (0,0,0),Inside " some Text ",Outside (0,0,0),Outside (0,0,0),Inside " more Text ",Outside (0,0,0)]
So these #{a,b,c}-bits should turn into different things as the rest of the text.
I have this code:
module ParsecTest where
import Text.ParserCombinators.Parsec
import Monad
type Reference = (Int, Int, Int)
data Transc = Inside String | Outside Reference
deriving (Show)
text :: Parser Transc
text = do
x <- manyTill anyChar ((lookAhead reference) <|> (eof >> return (Inside "")));
return (Inside x)
transc = reference <|> text
alot :: Parser [Transc]
alot = do
manyTill transc eof
reference :: Parser Transc
reference = try (do{ char '#';
char '{';
a <- number;
char ',';
b <- number;
char ',';
c <- number;
char '}';
return (Outside (a,b,c)) })
number :: Parser Int
number = do{ x <- many1 digit;
return (read x) }
This works as expected. You can test this in ghci by typing
parseTest alot "Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}"
But I think it's not nice.
1) Is the use of lookAhead really necessary for my problem?
2) Is the return (Inside "") an ugly hack?
3) Is there generally a more concise/smarter way to archieve the same?
1) I think you do need lookAhead as you need the result of that parse. It would be nice to avoid running that parser twice by having a Parser (Transc,Maybe Transc) to indicate an Inside with an optional following Outside. If performance is an issue, then this is worth doing.
2) Yes.
3) Applicatives
number2 :: Parser Int
number2 = read <$> many1 digit
text2 :: Parser Transc
text2 = (Inside .) . (:)
<$> anyChar
<*> manyTill anyChar (try (lookAhead reference2) *> pure () <|> eof)
reference2 :: Parser Transc
reference2 = ((Outside .) .) . (,,)
<$> (string "#{" *> number2 <* char ',')
<*> number2
<*> (char ',' *> number2 <* char '}')
transc2 = reference2 <|> text2
alot2 = many transc2
You may want to rewrite the beginning of reference2 using a helper like aux x y z = Outside (x,y,z).
EDIT: Changed text to deal with inputs that don't end with an Outside.