Propositions Parser using recursion in Elm - parsing

Yesterday, I posted about an assignment I got for parsing logical propositions. After a lot of research and trying different things, I got it to work for individual propositions: going from a string to my own custom type Proposition. However, now I am at a complete roadblock - I have almost no idea how to combine these components to work for more complex propositions. I am not even sure whether they are suitable to be combined and work together. You will the code and the screenshot of my current output below, any advice / ways for approaching this would be greatly appreciated!
type Proposition
= A
| B
| C
| And Proposition Proposition
| Or Proposition Proposition
| Implies Proposition Proposition
| Not Proposition
| Equal Proposition Proposition
andParser : Parser Proposition
andParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed And
|. symbol "&"
|. spaces
|. symbol "("
|. spaces
|= lazy (\_ -> andParser)
|. spaces
|. symbol ","
|. spaces
|= lazy (\_ -> andParser)
|. spaces
|. symbol ")"
]
orParser : Parser Proposition
orParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed Or
|. symbol "|"
|. spaces
|. symbol "("
|. spaces
|= lazy (\_ -> orParser)
|. spaces
|. symbol ","
|. spaces
|= lazy (\_ -> orParser)
|. spaces
|. symbol ")"
]
implParser : Parser Proposition
implParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed Implies
|. symbol ">"
|. spaces
|. symbol "("
|. spaces
|= lazy (\_ -> implParser)
|. spaces
|. symbol ","
|. spaces
|= lazy (\_ -> implParser)
|. spaces
|. symbol ")"
]
equalParser : Parser Proposition
equalParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed Equal
|. symbol "="
|. spaces
|. symbol "("
|. spaces
|= lazy (\_ -> equalParser)
|. spaces
|. symbol ","
|. spaces
|= lazy (\_ -> equalParser)
|. spaces
|. symbol ")"
]
notParser : Parser Proposition
notParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed Not
|. symbol "N"
|. symbol "("
|. spaces
|= lazy (\_ -> notParser)
|. spaces
|. symbol ")"
]
my_results: List String
my_results =
[ "and parser test ____& (A , B)______",
pr <| Parser.run andParser "& (A , C)",
"or parser test ____ | (A , B) ______",
pr <| Parser.run orParser "| (A , B)",
"implies parser test ____ > (A , B) ______",
pr <| Parser.run implParser "> (A , B)",
"equal parser test ____ = (A , B) ______",
pr <| Parser.run equalParser "= (A , B)",
"equal parser test ____ N ( A ) ______",
pr <| Parser.run notParser "N(B)",
"parsing & ( N (B) ) C",
pr <| Parser.run andParser "& ( A, N(B) ) "
]
Code output so far:

You're almost there, but you need a parser that can parse any kind of proposition, and you need to call that recursively instead of the individual parsers. There's a couple ways you can do that. The easiest is to just put all your existing parsers in a oneOf:
propositionParser : Parser Proposition
propositionParser =
oneOf
[ andParser
, orParser
, implParser
, equalParser
, notParser
]
and just call that from the other parsers, e.g. from notParser:
notParser : Parser Proposition
notParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, succeed Not
|. symbol "N"
|. symbol "("
|. spaces
|= lazy (\_ -> propositionParser)
|. spaces
|. symbol ")"
]
This has a lot of duplication though, as the variables are parsed for each expression, which would ale make it error-prone to add more variables. So let's simplify by moving those into propositionParser:
propositionParser : Parser Proposition
propositionParser =
oneOf
[ succeed A
|. keyword "A"
, succeed B
|. keyword "B"
, succeed C
|. keyword "C"
, andParser
, orParser
, implParser
, equalParser
, notParser
]
which will allow us to remove the oneOf from the individual parsers, since they're only handling one case each:
notParser : Parser Proposition
notParser =
succeed Not
|. symbol "N"
|. symbol "("
|. spaces
|= lazy (\_ -> propositionParser)
|. spaces
|. symbol ")"
You should now be able to see that the structure of the parser mirrors the type it parses. We now have a propositionParser with a oneOf where each case corresponds to a case from the Proposition type, and with each individual case parser using propositionParser where the type says it needs a Proposition. Knowing this, you should hopefully be able to create a parser for any custom type by creating small parsers for each individual piece, and then combine them by simply mimicking the structure of the type.

Related

Using makeExprParser with ambiguity

I'm currently encountering a problem while translating a parser from a CFG-based tool (antlr) to Megaparsec.
The grammar contains lists of expressions (handled with makeExprParser) that are enclosed in brackets (<, >) and separated by ,.
Stuff like <>, <23>, <23,87> etc.
The problem now is that the expressions may themselves contain the > operator (meaning "greater than"), which causes my parser to fail.
<1223>234> should, for example, be parsed into [BinaryExpression ">" (IntExpr 1223) (IntExpr 234)].
I presume that I have to strategically place try somewhere, but the places I tried (to the first argument of sepBy and the first argument of makeExprParser) did unfortunately not work.
Can I use makeExprParser in such a situation or do I have to manually write the expression parser?:
This is the relevant part of my parser:
-- uses megaparsec, text, and parser-combinators
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad.Combinators.Expr
import Data.Text
import Data.Void
import System.Environment
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type BinaryOperator = Text
type Name = Text
data Expr
= IntExpr Integer
| BinaryExpression BinaryOperator Expr Expr
deriving (Eq, Show)
type Parser = Parsec Void Text
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "//") (L.skipBlockCommentNested "/*" "*/")
parseInteger :: Parser Expr
parseInteger = do
number <- some digitChar
_ <- sc
return $ IntExpr $ read number
parseExpr :: Parser Expr
parseExpr = makeExprParser parseInteger [[InfixL (BinaryExpression ">" <$ symbol ">")]]
parseBracketList :: Parser [Expr]
parseBracketList = do
_ <- symbol "<"
exprs <- sepBy parseExpr (symbol ",")
_ <- symbol ">"
return exprs
main :: IO ()
main = do
text : _ <- getArgs
let res = runParser parseBracketList "stdin" (pack text)
case res of
(Right suc) -> do
print suc
(Left err) ->
putStrLn $ errorBundlePretty err
You've (probably) misdiagnosed the problem. Your parser fails on <1233>234> because it's trying to parse > as a left associative operator, like +. In other words, the same way:
1+2+
would fail, because the second + has no right-hand operand, your parser is failing because:
1233>234>
has no digit following the second >. Assuming you don't want your > operator to chain (i.e., 1>2>3 is not a valid Expr), you should first replace InfixL with InfixN (non-associative) in your makeExprParser table. Then, it will parse this example fine.
Unfortunately, with or without this change your parser will still fail on the simpler test case:
<1233>
because the > is interpreted as an operator within a continuing expression.
In other words, the problem isn't that your parser can't handle expressions with > characters, it's that it's overly aggressive in treating > characters as part of an expression, preventing them from being recognized as the closing angle bracket.
To fix this, you need to figure out exactly what you're parsing. Specifically, you need to resolve the ambiguity in your parser by precisely characterizing the situations where > can be part of a continuing expression and where it can't.
One rule that will probably work is to only consider a > as an operator if it is followed by a valid "term" (i.e., a parseInteger). You can do this with lookAhead. The parser:
symbol ">" <* lookAhead term
will parse a > operator only if it is followed by a valid term. If it fails to find a term, it will consume some input (at least the > symbol itself), so you must surround it with a try:
try (symbol ">" <* lookAhead term)
With the above two fixes applied to parseExpr:
parseExpr :: Parser Expr
parseExpr = makeExprParser term
[[InfixN (BinaryExpression ">" <$ try (symbol ">" <* lookAhead term))]]
where term = parseInteger
you'll get the following parses:
λ> parseTest parseBracketList "<23>"
[IntExpr 23]
λ> parseTest parseBracketList "<23,87>"
[IntExpr 23,IntExpr 87]
λ> parseTest parseBracketList "<23,87>18>"
[IntExpr 23,BinaryExpression ">" (IntExpr 87) (IntExpr 18)]
However, the following will fail:
λ> parseTest parseBracketList "<23,87>18"
1:10:
|
1 | <23,87>18
| ^
unexpected end of input
expecting ',', '>', or digit
λ>
because the fact that the > is followed by 18 means that it is a valid operator, and it is parse failure that the valid expression 87>18 is followed by neither a comma nor a closing > angle bracket.
If you need to parse something like <23,87>18, you have bigger problems. Consider the following two test cases:
<1,2>3,4,5,6,7,...,100000000000,100000000001>
<1,2>3,4,5,6,7,...,100000000000,100000000001
It's a challenge to write an efficient parser that will parse the first one as a list of 10000000000 expressions but the second one as a list of two expression:
[IntExpr 1, IntExpr 2]
followed by some "extra" text. Hopefully, the underlying "language" you're trying to parse isn't so hopelessly broken that this will be an issue.

Disallowing unnecessary outermost brackets in a BNFC-grammar

This is a continuation to this question I asked earlier about a BNFC-grammar for propositional logic. I got it working with parentheses, as per the definition, but I would now like to extend the grammar to work without parentheses, with a catch however: no unnecessary outer parentheses allowed.
For example, the atomic sentence a should be allowed, but (a) should not be recognized. The sentence (a => b) & c should also be allowed, but ((a => b) & c) not, and so forth. The last example highlights the necessity for paretheses. The precedence levels are
equivalence <=> and implication =>,
conjuction & and disjunction |
negation - and
atoms.
The higher the level, the earlier it will be parsed.
I got the grammar working with the unnecessary parentheses, by setting precedence levels to the different operators via recursion:
IFF . L ::= L "<=>" L1 ;
IF . L ::= L "=>" L1 ;
AND . L1 ::= L1 "&" L2 ;
OR . L1 ::= L1 "|" L2 ;
NOT . L2 ::= "-" L3 ;
NOT2 . L2 ::= "-" L2 ;
P . L3 ::= Ident ;
_ . L ::= L1 ;
_ . L1 ::= L2 ;
_ . L2 ::= L3 ;
_ . L3 ::= "(" L ")" ;
Now the question is, how do I not allow the outer parentheses, the allowance of which is caused by the last rule L3 ::= "(" L ")";? It is strictly necessary for allowing parentheses inside an expression, but it also allows them on the edges. I guess I need some extra rule for removing ambiguity, but what might that be like?
This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?
You can do this by simply banning the parenthesised form from the toplevel. This requires writing the precedence hierarchy in a different fashion, in order to propagate the restriction through the hierarchy. In the following, the r suffix indicates that the production is "restricted" to not be a parenthesised form.
I also fixed the reduce/reduce conflicts by eliminating one of the NOT productions. See below.
(I hope I got the BNFC right. I wrote this in bison and tried to convert the syntax afterwards.)
_ . S ::= L0r ;
IFF . L0r ::= L0 "<=>" L1 ;
IF . L0r ::= L0 "=>" L1 ;
AND . L1r ::= L1 "&" L2 ;
OR . L1r ::= L1 "|" L2 ;
NOT . L2r ::= "-" L2 ;
ID . L2r ::= Ident ;
PAREN . L3 ::= "(" L0 ")" ;
_ . L0r ::= L1r ;
_ . L1r ::= L2r ;
_ . L0 ::= L0r ;
_ . L1 ::= L1r ;
_ . L2 ::= L2r ;
_ . L0 ::= L3 ;
_ . L1 ::= L3 ;
_ . L2 ::= L3 ;
(Edit: I changed the IFF, IF, AND and OR rules by removing the restriction (r) from the first arguments. That allows the rules to match expressions which start with a parenthesis without matching the PAREN syntax.)
If you also wanted to disallow redundant internal parentheses (like ((a & b))), you could change the PAREN rule to
PAREN . L3 ::= "(" L0r ")" ;
which would make the L0 rule unnecessary.
A variant approach which uses fewer unit productions can be found in the answer by #IraBaxter to Grammar for expressions which disallows outer parentheses.
Side note:
This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?
No, recursive grammars can and should be unambiguous. Reduce/reduce conflicts are not inevitable, and almost always indicate problems in the grammar. In this case, they are the result of the redundant productions for the unary NOT operator. Having two different non-terminals which can both accept "-" L3 is obviously going to lead to an ambiguity, and ambiguities always produce parsing conflicts.

Parse EBNF with Megaparsec nested sepBy

As an exercise I try to parse a EBNF/ABNF grammar with Megaparsec. I got trivial stuff like terminals and optionals working, but I'm struggling with alternatives. With this grammar:
S ::= 'hello' ['world'] IDENTIFIER LITERAL | 'test';
And this code:
production :: Parser Production
production = sepBy1 alternativeTerm (char '|') >>= return . Production
alternativeTerm :: Parser AlternativeTerm
alternativeTerm = sepBy1 term space >>= return . AlternativeTerm
term :: Parser Term
term = terminal
<|> optional
<|> identifier
<|> literal
I get this error:
unexpected '|'
expecting "IDENTIFIER", "LITERAL", ''', '[', or white space
I guess the alternativeTerm parser is not returning to the production parser when it encounters a sequence that it cannot parse and throws an error instead.
What can I do about this? Change my ADT of an EBNF or should I somehow flatten the parsing. But then again, how can I do so?
It's probably best to expand my previous comment into a full answer.
Your grammar is basically a list of list of terms seperated (and ended) by whitespace, which in turn is seperated by |. Your solution with sepBy1 does not work because there is a trailing whitespace after LITERAL - sepBy1 assumes there is another term following that whitespace and tries to apply term to the |, which fails.
If your alternativeTerm is guaranteed to end with a whitespace character (or multiple), rewrite your alternativeTerm as follows:
alternativeTerm = (term `sepEndBy1` space) >>= return . AlternativeTerm

How to combine two parsers in Haskell ReadP anonymously?

The Problem
I want to chain two ReadP parsers anonymously.
Example
Input characters are in {'a', 'o', '+', ' '} where ' ' is a space.
I want to parse this input according to the following rules:
the first o or a is not preceeded by a space/plus
every other o is preceeded by a space
every other a is preceeded by a plus
The BNF
In case this is not clear, I came up with the following BNF:
{-
<Input> :: = <O> <Expr> | <A> <Expr>
<Expr> ::= <Space> <O> <Expr> | <Plus> <A> <Expr> | <EOL>
<O> ::= "o"
<A> ::= "a"
<Space> ::= " "
<Plus> ::= "+"
-}
Specific questions
The idea is, that the rules "o preceeded by space" and "a preceeded by plus"
(if not at the beginning) should not be part of the o or a parser, nor do
I want to create an explicit/named parser, because that is a part of the
definition of an expression (Expr).
How do I chain the oParser with the spaceParser?
Composition (does not work): spaceParser . oParser
(+++) is the symmetric choice: oParser (+++) spaceParser. And related to that: Is (+++) equivalent to <|> from Control.Applicative?
ReadP.choice [oParser, spaceParser] creates a new parser, which obviously does not work for a string, but I did that previously and the behaviour of choice surprised me. How does choice work?
The Code
--------------------------------------------------------------------------------
module Parser (main) where
--------------------------------------------------------------------------------
-- modules
import Text.ParserCombinators.ReadP as ReadP
import Data.Char as D
import Control.Applicative
--------------------------------------------------------------------------------
-- this parses one o
oParser :: ReadP Char
oParser = satisfy (== 'o')
-- this parses one a
aParser :: ReadP Char
aParser = satisfy (== 'a') --(\v -> v == 'a')
-- this parses one plus
pParser :: ReadP Char
pParser = satisfy (== '+')
spaceParser :: ReadP Char
spaceParser = satisfy D.isSpace
parseExpr :: ReadP String
parseExpr = -- ReadP.choice [oParser, spaceParser]
main :: IO ()
main = print [ x | (x, "") <- ReadP.readP_to_S parseExpr "o +a o+a+a o o"]
Thank you a lot for reading all of that :) (And: Which haskell parsing libraries would you recommend?)

Haskell: Parsing escape characters in single quotes

I'm currently making a scanner for a basic compiler I'm writing in Haskell. One of the requirements is that any character enclosed in single quotes (') is translated into a character literal token (type T_Char), and this includes escape sequences such as '\n' and '\t'. I've defined this part of the scanner function which works okay for most cases:
scanner ('\'':cs) | (length cs) == 0 = error "Illegal character!"
| head cs == '\\' = mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
| head (drop 1 cs) == '\'' = T_Char (head cs) : scanner (drop 2 cs)
where
mkEscape :: Char -> Token
mkEscape 'n' = T_Char '\n'
mkEscape 'r' = T_Char '\r'
mkEscape 't' = T_Char '\t'
mkEscape '\\' = T_Char '\\'
mkEscape '\'' = T_Char '\''
However, this comes up when I run it in GHCi:
Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "def"]
It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?
I don't think there's anything wrong with the parser regarding your problem. To Haskell, the string will be read as
abc '\' def
because Haskell also has string escapes. So when it reaches the first quotation mark, cs contains the char sequence \' def. Obviously head cs is a backslash, so it will run mkEscape.
The argument given is head (drop 1 cs), which is ', thus mkEscape will return T_Char '\'', which is what you saw.
Perhaps you should call
scanner "abc '\\\\' def"
The 1st level of \ is for the Haskell interpreter, and the 2nd level is for scanner.

Resources