Parser for binary operations between Rationals and Arithmetic expressions in Haskell - parsing

Inspired by the following project, I am working with linear expressions and I have defined the following structure and parser.
data AExp
= Lit Rational
| Var String
| AExp :+: AExp
| Rational :*: AExp
deriving (Eq)
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Expr
import Text.Parsec.Language (javaStyle)
import Text.Parsec.String
import Control.Monad (void, ap)
import qualified Text.Parsec.Token as Token
Token.TokenParser {..} = Token.makeTokenParser javaStyle
binary name fun = Infix (fun <$ reservedOp name) AssocLeft
whitespace :: Parser ()
whitespace = void $ many $ oneOf " \n\t"
regularParse :: Parser a -> String -> Either ParseError a
regularParse p = parse p ""
rational :: Parser Rational
rational = do
whitespace
num <- many1 digit
void $ char '/'
den <- many1 digit
whitespace
return $ toRational $ (read num)/ (read den)
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = Lit <$> rational
<|> Var <$> identifier
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> try (parens aexp)
table = [ [ binary "+" (:+:)]]
My problem is multiplication (:*:) and in general binary operations between two different types (Rational - AExp). The following example shows my result.
Main>regularParse aexp "10/1 * x"
Right (Lit 10 % 1)
That is, it does not match with multiplication (*), it matches with the literal variable Lit.
I have looked for some examples of parser of expressions, but in which I always found the multiplication is a binary operation between AExp, that is, the structure and parser are of this style:
data AExp
= Lit Rational
| Var String
| AExp:+: AExp
| AExp:*: AExp
deriving (Eq)
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = Lit <$> rational
<|> Var <$> identifier
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> try (parens aexp)
table = [[binary "+" (:+:)],
[binary "*" (:*:)]]
The project I am following is defined in this way.
I could try to take another focus for the parser, but for me it is easier to follow the project guide, since many of my structures are very similar.
How could I define the multiplication parser (:*:) or is there any example with to guide me?
Thanks in advance

The problem is here:
term = Lit <$> rational
<|> ...
<|> try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> ...
This will try parsing a lone rational first; if that succeeds, the remaining branches will not be attempted. This policy of committing to the first successful parse was chosen for efficiency. Just changing the order will get you over this hump and to whatever your next problem will be (there's always one, isn't there??).
term = try ((:*:) <$> (rational <* reservedOp "*") <*> aexp)
<|> Lit <$> rational
<|> ...

I thought a bit more about my original problem. I want to make a parser for linear expressions.
A limited, but sufficient solution is:
varAExp :: Parser AExp
varAExp = do
x <- identifier
return $ Var x
aexp :: Parser AExp
aexp = buildExpressionParser table term
where term = try ((:*:) <$> (rational <* reservedOp "*") <*> varAExp)
<|> Lit <$> rational
<|> Var <$> identifier
<|> try (parens aexp)
table = [[binary "+" (:+:) ]]
With this parser I can work with expressions of this type:
arit_1 = regularParse aexp "10/1*x + 3/2*y + 1/1 + 3/1"
I cannot express the multiplication of numbers, but as I said before it is enough for me.
thanks for your help

Related

Parsing a Formal Logic in Haskell

I have the following language I am trying to parse.
formula ::= true
| false
| var
| formula & formula
| ∀ var . formula
| (formula)
var ::= letter { letter | digit }*
I have been following this article on the Haskell wiki and have used Parsec combinators to create the following parser.
module LogicParser where
import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
-- Data Structures
data Formula = Var String
| TT
| FF
| Con Formula Formula
| Forall String Formula
deriving (Show)
-- Language Definition
lang :: LanguageDef st
lang =
emptyDef{ Token.identStart = letter
, Token.identLetter = alphaNum
, Token.opStart = oneOf "&."
, Token.opLetter = oneOf "&."
, Token.reservedOpNames = ["&", "."]
, Token.reservedNames = ["tt", "ff", "forall"]
}
-- Lexer for langauge
lexer =
Token.makeTokenParser lang
-- Trivial Parsers
identifier = Token.identifier lexer
keyword = Token.reserved lexer
op = Token.reservedOp lexer
roundBrackets = Token.parens lexer
whiteSpace = Token.whiteSpace lexer
-- Main Parser, takes care of trailing whitespaces
formulaParser :: Parser Formula
formulaParser = whiteSpace >> formula
-- Parsing Formulas
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
<|> forallFormula
-- Term in a Formula
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
-- Conjunction
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formula
-- Truth
ttFormula :: Parser Formula
ttFormula = keyword "tt" >> return TT
-- Falsehood
ffFormula :: Parser Formula
ffFormula = keyword "ff" >> return FF
-- Variable
varFormula :: Parser Formula
varFormula =
do var <- identifier
return $ Var var
-- Universal Statement
forallFormula :: Parser Formula
forallFormula =
do keyword "forall"
x <- identifier
op "."
phi <- formulaTerm
return $ Forall x phi
-- For running runghc
calculate :: String -> String
calculate s =
case ret of
Left e -> "Error: " ++ (show e)
Right n -> "Interpreted as: " ++ (show n)
where
ret = parse formulaParser "" s
main :: IO ()
main = interact (unlines . (map calculate) . lines)
The problem is the & operator, I am trying to model it on how the article parses expressions, using the Infix function and passing it a list of operators and a parser to parse terms. However, the parser is not behaving as desired and I can't work out why. Here are some examples of desired behaviour:
true -- parsing --> TT
true & false -- parsing --> Con TT FF
true & (true & false) -- parsing --> Con TT (Con TT FF)
forall x . false & true -- parsing --> Con (Forall "x" FF) TT
However currently the parser is producing no output. I appreciate any assistance.
I think I managed to fix the problem by rephrasing the grammar, and conjunction is still left-associative:
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formulaTerm
This parsed the following successfully.
The problem in your code is, that you try to parse a formula for which the first attempt is to parse a formulaCon. This then first tries to parse a formula, i.e. you create an infinite recursion without consuming any input.
To resolve this, you have to structure your grammar. Define your terms like this (note that all these terms consume some input before doing a recursion back to formula):
formulaTerm = ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
<|> roundBrackets formula
forallFormula = do
keyword "forall"
x <- identifier
op "."
phi <- formula
return $ Forall x phi
A formula is then either a single term or a conjunction consisting of term, an operator, and another formula. To ensure that all input is parsed, first try to parse a conjunction and - if that fails - parse a single term:
formula = (try formulaCon) <|> formulaTerm
Finally a formulaCon can be parsed like this:
formulaCon = do
f1 <- formulaTerm
op "&"
f2 <- formula
return $ Con f1 f2
The drawback with this solution is that conjunctions are now right associative.

Parsing juxtaposition-based, indentation-aware syntax using Text.Parsec.Layout

I'm trying to parse a small language with Haskell-like syntax, using parsec-layout. The two key features that don't seem to interact too well with each other are:
Function application syntax is juxtaposition, i.e. if F and E are terms, F E is the syntax for F applied to E.
Indentation can be used to denote nesting, i.e. the following two are equivalent:
X = case Y of
A -> V
B -> W
X = case Y of A -> V; B -> W
I haven't managed to figure out a combination of skipping and keeping whitespace that would allow me to parse a list of such definitions. Here's my simplified code:
import Text.Parsec hiding (space, runP)
import Text.Parsec.Layout
import Control.Monad (void)
type Parser = Parsec String LayoutEnv
data Term = Var String
| App Term Term
| Case Term [(String, Term)]
deriving Show
name :: Parser String
name = spaced $ (:) <$> upper <*> many alphaNum
kw :: String -> Parser ()
kw = void . spaced . string
reserved :: String -> Parser ()
reserved s = try $ spaced $ string s >> notFollowedBy alphaNum
term :: Parser Term
term = foldl1 App <$> part `sepBy1` space
where
part = choice [ caseBlock
, Var <$> name
]
caseBlock = Case <$> (reserved "case" *> term <* reserved "of") <*> laidout alt
alt = (,) <$> (name <* kw "->") <*> term
binding :: Parser (String, Term)
binding = (,) <$> (name <* kw "=") <*> term
-- https://github.com/luqui/parsec-layout/issues/1
trim :: String -> String
trim = reverse . dropWhile (== '\n') . reverse
runP :: Parser a -> String -> Either ParseError a
runP p = runParser (p <* eof) defaultLayoutEnv "" . trim
If I try to run it on input like
s = unlines [ "A = case B of"
, " X -> Y Z"
, "C = D"
]
via runP (laidout binding) s, it fails on the application Y Z:
(line 2, column 10):
expecting space or semi-colon
However, if I change the definition of term to
term = foldl1 App <$> many1 part
then it doesn't stop parsing the term at the start of the (unindented!) third line, leading to
(line 3, column 4):
expecting semi-colon
I think the problem has to do with that name already eliminates the following space, so the sepBy1 in the definition of term doesn't see it.
Consider these simplified versions of term:
term0 = foldl1 App <$> (Var <$> name) `sepBy1` space
term1 = foldl1 App <$> (Var <$> name') `sepBy1` space
name' = (:) <$> upper <*> many alphaNum
term3 = foldl1 App <$> many (Var <$> name)
Then:
runP term0 "A B C" -- fails
runP term1 "A B C" -- succeeds
runP term3 "A B C" -- succeeds
I think part of the solution is to define
part = [ caseBlock, Var <$> name' ]
where name' is as above. However, there are still some issues.

Is there any trick about translating BNF to Parsec program?

The BNF that match function call chain (like x(y)(z)...):
expr = term T
T = (expr) T
| EMPTY
term = (expr)
| VAR
Translate it to Parsec program that looks so tricky.
term :: Parser Term
term = parens expr <|> var
expr :: Parser Term
expr = do whiteSpace
e <- term
maybeAddSuffix e
where addSuffix e0 = do e1 <- parens expr
maybeAddSuffix $ TermApp e0 e1
maybeAddSuffix e = addSuffix e
<|> return e
Could you list all the design patterns about translating BNF to Parsec program?
The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.
If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!
First write the appropriate types:
{-# LANGUAGE LambdaCase #-}
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char
import Control.Applicative hiding ((<|>))
import Control.Monad
data Term = App Term Term | Var String deriving (Show, Eq)
data Token = LParen | RParen | Str String deriving (Show, Eq)
type Lexer = Parsec [Char] () -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token
Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.
oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|>
(char ')' >> return RParen) <|>
(Str <$> many1 letter)
Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:
lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces)
Note the placement of spaces: this way, white space is accepted at the beginning and end of the string.
Since Parser uses a custom token type, you have to use a custom satisfy function. Fortunately, this is almost identical to the existing satisfy.
satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show
(\src _ _ -> incSourceColumn src 1)
(\x -> if f x then Just x else Nothing)
Then we can write parsers for each of the primitive tokens.
lparen = satisfy' $ \case { LParen -> True ; _ -> False }
rparen = satisfy' $ \case { RParen -> True ; _ -> False }
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })
As you may imagine, parens would be useful for our purposes. It is very straightforward to write.
parens :: Parser a -> Parser a
parens = between lparen rparen
Now the interesting parts.
term, expr, var :: Parser Term
term = parens expr <|> var
var = Var <$> strTok
These two should be fairly obvious to you.
Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something".
expr = do
e0 <- term
option e0 (parens expr >>= \e1 -> return (App e0 e1))
The last line means - try to apply the parser given to option - if it fails, instead return e0.
For testing you can do:
tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""
The eof attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z) is not actually in your grammar!
>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input
But the following is
>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

Writing Parser for S Expressions

I'm trying to write a Parser for S Expressions from Prof. Yorgey's 2013 homework.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Given the following definitions, presented in the homework:
type Ident = String
-- An "atom" is either an integer value or an identifier.
data Atom = N Integer | I Ident
deriving Show
-- An S-expression is either an atom, or a list of S-expressions.
data SExpr = A Atom
| Comb [SExpr]
deriving Show
I wrote a parser for Parser Atom and Parser SExpr for A Atom.
parseAtom :: Parser Atom
parseAtom = alt n i
where n = (\_ z -> N z) <$> spaces <*> posInt
i = (\ _ z -> I z) <$> spaces <*> ident
parseAAtom :: Parser SExpr
parseAAtom = fmap (\x -> A x) parseAtom
Then, I attempted to write a parser to handle a Parser SExpr for the Comb ... case:
parseComb :: Parser SExpr
parseComb = (\_ _ x _ _ _ -> x) <$> (zeroOrMore spaces) <*> (char '(') <*>
(alt parseAAtom parseComb) <*> (zeroOrMore spaces)
<*> (char ')') <*> (zeroOrMore spaces)
Assuming that parseComb was right, I could simply make usage of oneOrMore for Parser [SExpr].
parseCombElements :: Parser [SExpr]
parseCombElements = oneOrMore parseComb
So, my two last functions compile, but running runParser parseComb "( foo )" never terminates.
What's wrong with my parseComb definition? Please don't give me the whole answer, but rather a hint - for my own learning.
I am very suspicious of zeroOrMore spaces, because spaces is usually a parser which itself parses zero or more spaces. Which means that it can parse the empty string if there aren't any spaces at that point. In particular, the spaces parser always succeeds.
But when you apply zeroOrMore to a parser that always succeeds, the combined parser will never stop - because zeroOrMore only stops trying again once its parser argument fails.
As an aside, Applicative expressions like (\_ _ x _ _ _ -> x) <$> ... <*> ... <*> ...... which only use a single of the subparsers can usually be written more succinctly with the *> and <* combinators:
... *> ... *> x_parser_here <* ... <* ...

How do you use parsec in a greedy fashion?

In my work I come across a lot of gnarly sql, and I had the bright idea of writing a program to parse the sql and print it out neatly. I made most of it pretty quickly, but I ran into a problem that I don't know how to solve.
So let's pretend the sql is "select foo from bar where 1". My thought was that there is always a keyword followed by data for it, so all I have to do is parse a keyword, and then capture all gibberish before the next keyword and store that for later cleanup, if it is worthwhile. Here's the code:
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Data.Text (strip)
newtype Statement = Statement [Atom]
data Atom = Branch String [Atom] | Leaf String deriving Show
trim str = reverse $ trim' (reverse $ trim' str)
where
trim' (' ':xs) = trim' xs
trim' str = str
printStatement atoms = mapM_ printAtom atoms
printAtom atom = loop 0 atom
where
loop depth (Leaf str) = putStrLn $ (replicate depth ' ') ++ str
loop depth (Branch str atoms) = do
putStrLn $ (replicate depth ' ') ++ str
mapM_ (loop (depth + 2)) atoms
keywords :: [String]
keywords = [
"select",
"update",
"delete",
"from",
"where"]
keywordparser :: Parsec String u String
keywordparser = try ((choice $ map string keywords) <?> "keywordparser")
stuffparser :: Parsec String u String
stuffparser = manyTill anyChar (eof <|> (lookAhead keywordparser >> return ()))
statementparser = do
key <- keywordparser
stuff <- stuffparser
return $ Branch key [Leaf (trim stuff)]
<?> "statementparser"
tp = parse (many statementparser) ""
The key here is the stuffparser. That is the stuff in between the keywords that could be anything from column lists to where criteria. This function catches all characters leading up to a keyword. But it needs something else before it is finished. What if there is a subselect? "select id,(select product from products) from bar". Well in that case if it hits that keyword, it screws everything up, parses it wrong and screws up my indenting. Also where clauses can have parenthesis as well.
So I need to change that anyChar into another combinator that slurps up characters one at a time but also tries to look for parenthesis, and if it finds them, traverse and capture all that, but also if there are more parenthesis, do that until we have fully closed the parenthesis, then concatenate it all and return it. Here's what I've tried, but I can't quite get it to work.
stuffparser :: Parsec String u String
stuffparser = fmap concat $ manyTill somechars (eof <|> (lookAhead keywordparser >> return ()))
where
somechars = parens <|> fmap (\c -> [c]) anyChar
parens= between (char '(') (char ')') somechars
This will error like so:
> tp "select asdf(qwerty) from foo where 1"
Left (line 1, column 14):
unexpected "w"
expecting ")"
But I can't think of any way to rewrite this so that it works. I've tried to use manyTill on the parenthesis part, but I end up having trouble getting it to typecheck when I have both string producing parens and single chars as alternatives. Does anyone have any suggestions on how to go about this?
Yeah, between might not work for what you're looking for. Of course, for your use case, I'd follow hammar's suggestion and grab an off-the-shelf SQL parser. (personal opinion: or, try not to use SQL unless you really have to; the idea to use strings for database queries was imho a historical mistake).
Note: I add an operator called <++> which will concatenate the results of two parsers, whether they are strings or characters. (code at bottom.)
First, for the task of parsing parenthesis: the top level will parse some stuff between the relevant characters, which is exactly what the code says,
parseParen = char '(' <++> inner <++> char ')'
Then, the inner function should parse anything else: non-parens, possibly including another set of parenthesis, and non-paren junk that follows.
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
I'll make the assumption that for the rest of the solution, what you want to do is analgous to splitting things up by top-level SQL keywords. (i.e. ignoring those in parenthesis). Namely, we'll have a parser that will behave like so,
Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
[(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]
Suppose we have a parseKw parser that will get the likes of select, etc. After we consume a keyword, we need to read until the next [top-level] keyword. The last trick to my solution is using the lookAhead combinator to determine whether the next word is a keyword, and put it back if so. If it's not, then we consume a parenthesis or other character, and then recurse on the rest.
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
My entire solution is as follows
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
edit - version with quote support
you can do the same thing as with the parens to support quotes,
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseQuote = char '\'' <++> inner <++> char '\'' where
inner = many (noneOf "'\\") <++>
option "" (char '\\' <++> anyChar <++> inner)
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "'()") <++>
(parseQuote <++> inner <|> option "" (parseParen <++> inner))
I tried it with parseTest parseSqlToplevel "select ('a(sdf'())b". cheers

Resources