I have the following language I am trying to parse.
formula ::= true
| false
| var
| formula & formula
| ∀ var . formula
| (formula)
var ::= letter { letter | digit }*
I have been following this article on the Haskell wiki and have used Parsec combinators to create the following parser.
module LogicParser where
import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
-- Data Structures
data Formula = Var String
| TT
| FF
| Con Formula Formula
| Forall String Formula
deriving (Show)
-- Language Definition
lang :: LanguageDef st
lang =
emptyDef{ Token.identStart = letter
, Token.identLetter = alphaNum
, Token.opStart = oneOf "&."
, Token.opLetter = oneOf "&."
, Token.reservedOpNames = ["&", "."]
, Token.reservedNames = ["tt", "ff", "forall"]
}
-- Lexer for langauge
lexer =
Token.makeTokenParser lang
-- Trivial Parsers
identifier = Token.identifier lexer
keyword = Token.reserved lexer
op = Token.reservedOp lexer
roundBrackets = Token.parens lexer
whiteSpace = Token.whiteSpace lexer
-- Main Parser, takes care of trailing whitespaces
formulaParser :: Parser Formula
formulaParser = whiteSpace >> formula
-- Parsing Formulas
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
<|> forallFormula
-- Term in a Formula
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
-- Conjunction
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formula
-- Truth
ttFormula :: Parser Formula
ttFormula = keyword "tt" >> return TT
-- Falsehood
ffFormula :: Parser Formula
ffFormula = keyword "ff" >> return FF
-- Variable
varFormula :: Parser Formula
varFormula =
do var <- identifier
return $ Var var
-- Universal Statement
forallFormula :: Parser Formula
forallFormula =
do keyword "forall"
x <- identifier
op "."
phi <- formulaTerm
return $ Forall x phi
-- For running runghc
calculate :: String -> String
calculate s =
case ret of
Left e -> "Error: " ++ (show e)
Right n -> "Interpreted as: " ++ (show n)
where
ret = parse formulaParser "" s
main :: IO ()
main = interact (unlines . (map calculate) . lines)
The problem is the & operator, I am trying to model it on how the article parses expressions, using the Infix function and passing it a list of operators and a parser to parse terms. However, the parser is not behaving as desired and I can't work out why. Here are some examples of desired behaviour:
true -- parsing --> TT
true & false -- parsing --> Con TT FF
true & (true & false) -- parsing --> Con TT (Con TT FF)
forall x . false & true -- parsing --> Con (Forall "x" FF) TT
However currently the parser is producing no output. I appreciate any assistance.
I think I managed to fix the problem by rephrasing the grammar, and conjunction is still left-associative:
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formulaTerm
This parsed the following successfully.
The problem in your code is, that you try to parse a formula for which the first attempt is to parse a formulaCon. This then first tries to parse a formula, i.e. you create an infinite recursion without consuming any input.
To resolve this, you have to structure your grammar. Define your terms like this (note that all these terms consume some input before doing a recursion back to formula):
formulaTerm = ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
<|> roundBrackets formula
forallFormula = do
keyword "forall"
x <- identifier
op "."
phi <- formula
return $ Forall x phi
A formula is then either a single term or a conjunction consisting of term, an operator, and another formula. To ensure that all input is parsed, first try to parse a conjunction and - if that fails - parse a single term:
formula = (try formulaCon) <|> formulaTerm
Finally a formulaCon can be parsed like this:
formulaCon = do
f1 <- formulaTerm
op "&"
f2 <- formula
return $ Con f1 f2
The drawback with this solution is that conjunctions are now right associative.
Related
For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))
There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.
The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]
Inspired by the following project, I am working with linear expressions and I have defined the following structure and parser.
data AExp
= Lit Rational
| Var String
| AExp :+: AExp
| Rational :*: AExp
deriving (Eq)
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Expr
import Text.Parsec.Language (javaStyle)
import Text.Parsec.String
import Control.Monad (void, ap)
import qualified Text.Parsec.Token as Token
Token.TokenParser {..} = Token.makeTokenParser javaStyle
binary name fun = Infix (fun <$ reservedOp name) AssocLeft
whitespace :: Parser ()
whitespace = void $ many $ oneOf " \n\t"
regularParse :: Parser a -> String -> Either ParseError a
regularParse p = parse p ""
rational :: Parser Rational
rational = do
whitespace
num <- many1 digit
void $ char '/'
den <- many1 digit
whitespace
return $ toRational $ (read num)/ (read den)
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = Lit <$> rational
<|> Var <$> identifier
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> try (parens aexp)
table = [ [ binary "+" (:+:)]]
My problem is multiplication (:*:) and in general binary operations between two different types (Rational - AExp). The following example shows my result.
Main>regularParse aexp "10/1 * x"
Right (Lit 10 % 1)
That is, it does not match with multiplication (*), it matches with the literal variable Lit.
I have looked for some examples of parser of expressions, but in which I always found the multiplication is a binary operation between AExp, that is, the structure and parser are of this style:
data AExp
= Lit Rational
| Var String
| AExp:+: AExp
| AExp:*: AExp
deriving (Eq)
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = Lit <$> rational
<|> Var <$> identifier
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> try (parens aexp)
table = [[binary "+" (:+:)],
[binary "*" (:*:)]]
The project I am following is defined in this way.
I could try to take another focus for the parser, but for me it is easier to follow the project guide, since many of my structures are very similar.
How could I define the multiplication parser (:*:) or is there any example with to guide me?
Thanks in advance
The problem is here:
term = Lit <$> rational
<|> ...
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> ...
This will try parsing a lone rational first; if that succeeds, the remaining branches will not be attempted. This policy of committing to the first successful parse was chosen for efficiency. Just changing the order will get you over this hump and to whatever your next problem will be (there's always one, isn't there??).
term = try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> Lit <$> rational
<|> ...
I thought a bit more about my original problem. I want to make a parser for linear expressions.
A limited, but sufficient solution is:
varAExp :: Parser AExp
varAExp = do
x <- identifier
return $ Var x
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = try ((:*:) <$> (rational <* reservedOp "*") <*> varAExp)
<|> Lit <$> rational
<|> Var <$> identifier
<|> try (parens aexp)
table = [[binary "+" (:+:) ]]
With this parser I can work with expressions of this type:
arit_1 = regularParse aexp "10/1*x + 3/2*y + 1/1 + 3/1"
I cannot express the multiplication of numbers, but as I said before it is enough for me.
thanks for your help
I'm trying to parse a small language with Haskell-like syntax, using parsec-layout. The two key features that don't seem to interact too well with each other are:
Function application syntax is juxtaposition, i.e. if F and E are terms, F E is the syntax for F applied to E.
Indentation can be used to denote nesting, i.e. the following two are equivalent:
X = case Y of
A -> V
B -> W
X = case Y of A -> V; B -> W
I haven't managed to figure out a combination of skipping and keeping whitespace that would allow me to parse a list of such definitions. Here's my simplified code:
import Text.Parsec hiding (space, runP)
import Text.Parsec.Layout
import Control.Monad (void)
type Parser = Parsec String LayoutEnv
data Term = Var String
| App Term Term
| Case Term [(String, Term)]
deriving Show
name :: Parser String
name = spaced $ (:) <$> upper <*> many alphaNum
kw :: String -> Parser ()
kw = void . spaced . string
reserved :: String -> Parser ()
reserved s = try $ spaced $ string s >> notFollowedBy alphaNum
term :: Parser Term
term = foldl1 App <$> part `sepBy1` space
where
part = choice [ caseBlock
, Var <$> name
]
caseBlock = Case <$> (reserved "case" *> term <* reserved "of") <*> laidout alt
alt = (,) <$> (name <* kw "->") <*> term
binding :: Parser (String, Term)
binding = (,) <$> (name <* kw "=") <*> term
-- https://github.com/luqui/parsec-layout/issues/1
trim :: String -> String
trim = reverse . dropWhile (== '\n') . reverse
runP :: Parser a -> String -> Either ParseError a
runP p = runParser (p <* eof) defaultLayoutEnv "" . trim
If I try to run it on input like
s = unlines [ "A = case B of"
, " X -> Y Z"
, "C = D"
]
via runP (laidout binding) s, it fails on the application Y Z:
(line 2, column 10):
expecting space or semi-colon
However, if I change the definition of term to
term = foldl1 App <$> many1 part
then it doesn't stop parsing the term at the start of the (unindented!) third line, leading to
(line 3, column 4):
expecting semi-colon
I think the problem has to do with that name already eliminates the following space, so the sepBy1 in the definition of term doesn't see it.
Consider these simplified versions of term:
term0 = foldl1 App <$> (Var <$> name) `sepBy1` space
term1 = foldl1 App <$> (Var <$> name') `sepBy1` space
name' = (:) <$> upper <*> many alphaNum
term3 = foldl1 App <$> many (Var <$> name)
Then:
runP term0 "A B C" -- fails
runP term1 "A B C" -- succeeds
runP term3 "A B C" -- succeeds
I think part of the solution is to define
part = [ caseBlock, Var <$> name' ]
where name' is as above. However, there are still some issues.
I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.
Any thoughts?
You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.
Here's my attempt (I threw in |, +, and ? for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Maybe a more experienced Haskeller will come along with a better solution.
In my work I come across a lot of gnarly sql, and I had the bright idea of writing a program to parse the sql and print it out neatly. I made most of it pretty quickly, but I ran into a problem that I don't know how to solve.
So let's pretend the sql is "select foo from bar where 1". My thought was that there is always a keyword followed by data for it, so all I have to do is parse a keyword, and then capture all gibberish before the next keyword and store that for later cleanup, if it is worthwhile. Here's the code:
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Data.Text (strip)
newtype Statement = Statement [Atom]
data Atom = Branch String [Atom] | Leaf String deriving Show
trim str = reverse $ trim' (reverse $ trim' str)
where
trim' (' ':xs) = trim' xs
trim' str = str
printStatement atoms = mapM_ printAtom atoms
printAtom atom = loop 0 atom
where
loop depth (Leaf str) = putStrLn $ (replicate depth ' ') ++ str
loop depth (Branch str atoms) = do
putStrLn $ (replicate depth ' ') ++ str
mapM_ (loop (depth + 2)) atoms
keywords :: [String]
keywords = [
"select",
"update",
"delete",
"from",
"where"]
keywordparser :: Parsec String u String
keywordparser = try ((choice $ map string keywords) <?> "keywordparser")
stuffparser :: Parsec String u String
stuffparser = manyTill anyChar (eof <|> (lookAhead keywordparser >> return ()))
statementparser = do
key <- keywordparser
stuff <- stuffparser
return $ Branch key [Leaf (trim stuff)]
<?> "statementparser"
tp = parse (many statementparser) ""
The key here is the stuffparser. That is the stuff in between the keywords that could be anything from column lists to where criteria. This function catches all characters leading up to a keyword. But it needs something else before it is finished. What if there is a subselect? "select id,(select product from products) from bar". Well in that case if it hits that keyword, it screws everything up, parses it wrong and screws up my indenting. Also where clauses can have parenthesis as well.
So I need to change that anyChar into another combinator that slurps up characters one at a time but also tries to look for parenthesis, and if it finds them, traverse and capture all that, but also if there are more parenthesis, do that until we have fully closed the parenthesis, then concatenate it all and return it. Here's what I've tried, but I can't quite get it to work.
stuffparser :: Parsec String u String
stuffparser = fmap concat $ manyTill somechars (eof <|> (lookAhead keywordparser >> return ()))
where
somechars = parens <|> fmap (\c -> [c]) anyChar
parens= between (char '(') (char ')') somechars
This will error like so:
> tp "select asdf(qwerty) from foo where 1"
Left (line 1, column 14):
unexpected "w"
expecting ")"
But I can't think of any way to rewrite this so that it works. I've tried to use manyTill on the parenthesis part, but I end up having trouble getting it to typecheck when I have both string producing parens and single chars as alternatives. Does anyone have any suggestions on how to go about this?
Yeah, between might not work for what you're looking for. Of course, for your use case, I'd follow hammar's suggestion and grab an off-the-shelf SQL parser. (personal opinion: or, try not to use SQL unless you really have to; the idea to use strings for database queries was imho a historical mistake).
Note: I add an operator called <++> which will concatenate the results of two parsers, whether they are strings or characters. (code at bottom.)
First, for the task of parsing parenthesis: the top level will parse some stuff between the relevant characters, which is exactly what the code says,
parseParen = char '(' <++> inner <++> char ')'
Then, the inner function should parse anything else: non-parens, possibly including another set of parenthesis, and non-paren junk that follows.
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
I'll make the assumption that for the rest of the solution, what you want to do is analgous to splitting things up by top-level SQL keywords. (i.e. ignoring those in parenthesis). Namely, we'll have a parser that will behave like so,
Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
[(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]
Suppose we have a parseKw parser that will get the likes of select, etc. After we consume a keyword, we need to read until the next [top-level] keyword. The last trick to my solution is using the lookAhead combinator to determine whether the next word is a keyword, and put it back if so. If it's not, then we consume a parenthesis or other character, and then recurse on the rest.
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
My entire solution is as follows
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
edit - version with quote support
you can do the same thing as with the parens to support quotes,
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseQuote = char '\'' <++> inner <++> char '\'' where
inner = many (noneOf "'\\") <++>
option "" (char '\\' <++> anyChar <++> inner)
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "'()") <++>
(parseQuote <++> inner <|> option "" (parseParen <++> inner))
I tried it with parseTest parseSqlToplevel "select ('a(sdf'())b". cheers