Parse string to list in Haskell - parsing

I would like to parse string to list in Haskell, but I have no idea how to write it proper.
String structure is:
A [B [], C [D [], E []]]
and it represents structure
A-+-B
|
`-C-+-D
|
`-E
And ideas?
Thanks!

Your best bet when parsing in Haskell is almost always Parsec. Here's an example.
import Text.ParserCombinators.Parsec
data Tree a = Tree [Tree a] | Leaf a deriving Show
parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf
parseNode :: Parser (Tree Char)
parseNode = do
char '['
a <- parseTree
char ']'
return (Tree a)
parseTree = many1 (parseLeaf <|> parseNode)
Testing this out:
> parseTest parseTree "a[aa]"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a']]
> parseTest parseTree "a[aa]aaa"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a'],Leaf 'a',Leaf 'a',Leaf 'a']
> parseTest parseTree "a[aa[aaaaaa]]aaa"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a',Tree [Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a']],Leaf 'a',Leaf 'a',Leaf 'a']
Here's how it works. Parsers in Parsec are monadic, and so support the usual do-notation. (You can write parsers applicatively as well, you can look up how to do that elsewhere.)
We start with a simple data structure
data Tree a = Tree [Tree a] | Leaf a deriving Show
This is not quite the same as what you wanted, but as I wasn't entirely sure what your semantics were, I've used this example. You should be able to adapt it for your purposes.
We then need a parser for each possible part of the tree. A leaf is fairly simple, it is just anything that isn't a bracket:
parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf
Note that this could have been written out in longhand like
parseLeaf = do
a <- noneOf "[]"
return (Leaf a)
Then to parse a branching part of the tree, we need to parse the opening and closing brackets. In between the brackets we can have a whole tree again.
parseNode :: Parser (Tree Char)
parseNode = do
char '['
a <- parseTree
char ']'
return (Tree a)
So what is parseTree? It is just many of either of the parsers we have written. The <|> operator allows the parser to choose either parser, whichever parses correctly first. So we have
parseTree = many1 (parseLeaf <|> parseNode)
You should be able to adapt this to your purposes. It looks like your structure might be somewhat more like this:
data Structure a = Node (Structure a) a a | Leaf a
By following the same principles, working out what the parser is needed for each possibility and then combining them, you should be parsing in no time.
UPDATE
Here is a very quick and dirty version of parsing the data structure you asked about. It does not support spaces or commas, but should help demonstrate the basic principle.
data Tree = Tree Char [Tree] deriving Show
parseTree :: Parser Tree
parseTree = do
character <- noneOf "[]"
subtree <- parseSubTree
return $ Tree character subtree
parseSubTree :: Parser [Tree]
parseSubTree = do
char '['
trees <- many parseTree
char ']'
return trees
And here is a version with the commas and whitespace added in a fairly simple way. There are a lot of useful combinators in the parsec library that could simplify and improve this, you should investigate them yourself. Note also the applicative style used for the symbol shortcut parser definition. Many people prefer applicative style for parsers and it can be a lot more succinct, so that is worth finding out about as well.
data Tree = Tree Char [Tree] deriving Show
symbol :: String -> Parser String
symbol s = string s <* spaces
parseTree :: Parser Tree
parseTree = do
character <- noneOf "[]"
spaces
subtree <- parseSubTree
return $ Tree character subtree
parseSubTree :: Parser [Tree]
parseSubTree = do
symbol "["
trees <- sepBy parseTree (symbol ",")
symbol "]"
return trees
Here it is working:
> parseTest parseTree "A [ A [ B [ ] , C [ ], D [ ] ] ] "
Tree 'A' [Tree 'A' [Tree 'B' [],Tree 'C' [],Tree 'D' []]]

Related

Using makeExprParser with ambiguity

I'm currently encountering a problem while translating a parser from a CFG-based tool (antlr) to Megaparsec.
The grammar contains lists of expressions (handled with makeExprParser) that are enclosed in brackets (<, >) and separated by ,.
Stuff like <>, <23>, <23,87> etc.
The problem now is that the expressions may themselves contain the > operator (meaning "greater than"), which causes my parser to fail.
<1223>234> should, for example, be parsed into [BinaryExpression ">" (IntExpr 1223) (IntExpr 234)].
I presume that I have to strategically place try somewhere, but the places I tried (to the first argument of sepBy and the first argument of makeExprParser) did unfortunately not work.
Can I use makeExprParser in such a situation or do I have to manually write the expression parser?:
This is the relevant part of my parser:
-- uses megaparsec, text, and parser-combinators
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Monad.Combinators.Expr
import Data.Text
import Data.Void
import System.Environment
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type BinaryOperator = Text
type Name = Text
data Expr
= IntExpr Integer
| BinaryExpression BinaryOperator Expr Expr
deriving (Eq, Show)
type Parser = Parsec Void Text
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "//") (L.skipBlockCommentNested "/*" "*/")
parseInteger :: Parser Expr
parseInteger = do
number <- some digitChar
_ <- sc
return $ IntExpr $ read number
parseExpr :: Parser Expr
parseExpr = makeExprParser parseInteger [[InfixL (BinaryExpression ">" <$ symbol ">")]]
parseBracketList :: Parser [Expr]
parseBracketList = do
_ <- symbol "<"
exprs <- sepBy parseExpr (symbol ",")
_ <- symbol ">"
return exprs
main :: IO ()
main = do
text : _ <- getArgs
let res = runParser parseBracketList "stdin" (pack text)
case res of
(Right suc) -> do
print suc
(Left err) ->
putStrLn $ errorBundlePretty err
You've (probably) misdiagnosed the problem. Your parser fails on <1233>234> because it's trying to parse > as a left associative operator, like +. In other words, the same way:
1+2+
would fail, because the second + has no right-hand operand, your parser is failing because:
1233>234>
has no digit following the second >. Assuming you don't want your > operator to chain (i.e., 1>2>3 is not a valid Expr), you should first replace InfixL with InfixN (non-associative) in your makeExprParser table. Then, it will parse this example fine.
Unfortunately, with or without this change your parser will still fail on the simpler test case:
<1233>
because the > is interpreted as an operator within a continuing expression.
In other words, the problem isn't that your parser can't handle expressions with > characters, it's that it's overly aggressive in treating > characters as part of an expression, preventing them from being recognized as the closing angle bracket.
To fix this, you need to figure out exactly what you're parsing. Specifically, you need to resolve the ambiguity in your parser by precisely characterizing the situations where > can be part of a continuing expression and where it can't.
One rule that will probably work is to only consider a > as an operator if it is followed by a valid "term" (i.e., a parseInteger). You can do this with lookAhead. The parser:
symbol ">" <* lookAhead term
will parse a > operator only if it is followed by a valid term. If it fails to find a term, it will consume some input (at least the > symbol itself), so you must surround it with a try:
try (symbol ">" <* lookAhead term)
With the above two fixes applied to parseExpr:
parseExpr :: Parser Expr
parseExpr = makeExprParser term
[[InfixN (BinaryExpression ">" <$ try (symbol ">" <* lookAhead term))]]
where term = parseInteger
you'll get the following parses:
λ> parseTest parseBracketList "<23>"
[IntExpr 23]
λ> parseTest parseBracketList "<23,87>"
[IntExpr 23,IntExpr 87]
λ> parseTest parseBracketList "<23,87>18>"
[IntExpr 23,BinaryExpression ">" (IntExpr 87) (IntExpr 18)]
However, the following will fail:
λ> parseTest parseBracketList "<23,87>18"
1:10:
|
1 | <23,87>18
| ^
unexpected end of input
expecting ',', '>', or digit
λ>
because the fact that the > is followed by 18 means that it is a valid operator, and it is parse failure that the valid expression 87>18 is followed by neither a comma nor a closing > angle bracket.
If you need to parse something like <23,87>18, you have bigger problems. Consider the following two test cases:
<1,2>3,4,5,6,7,...,100000000000,100000000001>
<1,2>3,4,5,6,7,...,100000000000,100000000001
It's a challenge to write an efficient parser that will parse the first one as a list of 10000000000 expressions but the second one as a list of two expression:
[IntExpr 1, IntExpr 2]
followed by some "extra" text. Hopefully, the underlying "language" you're trying to parse isn't so hopelessly broken that this will be an issue.

Parsing a series of lambda calculus terms

I am writing a lambda calculus parser in Haskell and I can't find a solution to fix its current problem.
How I parse expressions:
expr :: Parser LamExpr
expr = do terms <- some $ token term
return $ foldl1 LamApp terms
How I parse terms:
term :: Parser LamExpr
term = do symbol "("
e <- expr
symbol ")"
return e
<|> do symbol "\\"
x <- var
symbol "->"
e <- expr
return $ LamAbs x e
<|> do {x <- var; return $ LamVar x}
<|> do {name <- macroName; return $ LamMacro name}
On input "x1 x2) x3" my parser returns
LamApp (LamVar 1) (LamVar 2)
Parsing should fail as it is syntactically incorrect, but it still parses the first application. I think this is because of do terms <- some $ token term which will parse as much as it can due to some.
How can I fix this so that the whole parsing fails instead of one section?
I assume you are using some parsec variant. You just have to add an eof to the end of your parser.
parseInput = do
e <- expr
eof
pure e -- (*)
Or for short using a Control.Applicative combinator:
parseInput = expr <* eof
(*) btw the community is starting to use pure instead of return these days

Haskell - intersperse a parser with another one

I have two parsers parser1 :: Parser a and parser2 :: Parser a.
I would like now to parse a list of as interspersing them with parser2
The desired signature is something like
interspersedParser :: Parser b -> Parser a -> Parser [a]
For example, if Parser a parses the 'a' character and Parser b parser the 'b' character, then the interspersedParser should parse
""
"a"
"aba"
"ababa"
...
I'm using megaparsec. Is there already some combinator which behaves like this, which I'm currently not able to find?
In parsec there is a sepBy parser which does that. The same parser seems to be available in megaparsec as well: https://hackage.haskell.org/package/megaparsec-4.4.0/docs/Text-Megaparsec-Combinator.html
Sure, you can use sepBy, but isn't this just:
interspersedParser sepP thingP = (:) <$> thingP <*> many (sepP *> thingP)
EDIT: Oh, this requires at least one thing to be there. You also wanted empty, so just stick a <|> pure [] on the end.
In fact, this is basically how sepBy1 (a variant of sepBy that requires at least one) is implemented:
-- | #sepBy p sep# parses /zero/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
--
-- > commaSep p = p `sepBy` comma
sepBy :: Alternative m => m a -> m sep -> m [a]
sepBy p sep = sepBy1 p sep <|> pure []
{-# INLINE sepBy #-}
-- | #sepBy1 p sep# parses /one/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
sepBy1 :: Alternative m => m a -> m sep -> m [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
{-# INLINE sepBy1 #-}

How to combine two parsers in Haskell ReadP anonymously?

The Problem
I want to chain two ReadP parsers anonymously.
Example
Input characters are in {'a', 'o', '+', ' '} where ' ' is a space.
I want to parse this input according to the following rules:
the first o or a is not preceeded by a space/plus
every other o is preceeded by a space
every other a is preceeded by a plus
The BNF
In case this is not clear, I came up with the following BNF:
{-
<Input> :: = <O> <Expr> | <A> <Expr>
<Expr> ::= <Space> <O> <Expr> | <Plus> <A> <Expr> | <EOL>
<O> ::= "o"
<A> ::= "a"
<Space> ::= " "
<Plus> ::= "+"
-}
Specific questions
The idea is, that the rules "o preceeded by space" and "a preceeded by plus"
(if not at the beginning) should not be part of the o or a parser, nor do
I want to create an explicit/named parser, because that is a part of the
definition of an expression (Expr).
How do I chain the oParser with the spaceParser?
Composition (does not work): spaceParser . oParser
(+++) is the symmetric choice: oParser (+++) spaceParser. And related to that: Is (+++) equivalent to <|> from Control.Applicative?
ReadP.choice [oParser, spaceParser] creates a new parser, which obviously does not work for a string, but I did that previously and the behaviour of choice surprised me. How does choice work?
The Code
--------------------------------------------------------------------------------
module Parser (main) where
--------------------------------------------------------------------------------
-- modules
import Text.ParserCombinators.ReadP as ReadP
import Data.Char as D
import Control.Applicative
--------------------------------------------------------------------------------
-- this parses one o
oParser :: ReadP Char
oParser = satisfy (== 'o')
-- this parses one a
aParser :: ReadP Char
aParser = satisfy (== 'a') --(\v -> v == 'a')
-- this parses one plus
pParser :: ReadP Char
pParser = satisfy (== '+')
spaceParser :: ReadP Char
spaceParser = satisfy D.isSpace
parseExpr :: ReadP String
parseExpr = -- ReadP.choice [oParser, spaceParser]
main :: IO ()
main = print [ x | (x, "") <- ReadP.readP_to_S parseExpr "o +a o+a+a o o"]
Thank you a lot for reading all of that :) (And: Which haskell parsing libraries would you recommend?)

Haskell Parsec Parser for Encountering [...]

I'm attempting to write a parser in Haskell using Parsec. Currently I have a program that can parse
test x [1,2,3] end
The code that does this is given as follows
testParser = do {
reserved "test";
v <- identifier;
symbol "[";
l <- sepBy natural commaSep;
symbol "]";
p <- pParser;
return $ Test v (List l) p
} <?> "end"
where commaSep is defined as
commaSep = skipMany1 (space <|> char ',')
Now is there some way for me to parse a similar statement, specifically:
test x [1...3] end
Being new to Haskell, and Parsec for that matter, I'm sure there's some nice concise way of doing this that I'm just not aware of. Any help would be appreciated.
Thanks again.
I'll be using some functions from Control.Applicative like (*>). These functions are useful if you want to avoid the monadic interface of Parsec and prefer the applicative interface, because the parsers become easier to read that way in my opinion.
If you aren't familiar with the basic applicative functions, leave a comment and I'll explain them. You can look them up on Hoogle if you are unsure.
As I've understood your problem, you want a parser for some data structure like this:
data Test = Test String Numbers
data Numbers = List [Int] | Range Int Int
A parser that can parse such a data structure would look like this (I've not compiled the code, but it should work):
-- parses "test <identifier> [<numbers>] end"
testParser :: Parser Test
testParser =
Test <$> reserved "test" *> identifier
<*> symbol "[" *> numbersParser <* symbol "]"
<* reserved "end"
<?> "test"
numbersParser :: Parser Numbers
numbersParser = try listParser <|> rangeParser
-- parses "<natural>, <natural>, <natural>" etc
listParser :: Parser Numbers
listParser =
List <$> sepBy natural (symbol ",")
<?> "list"
-- parses "<natural> ... <natural>"
rangeParser :: Parser Numbers
rangeParser =
Range <$> natural <* symbol "..."
<*> natural
<?> "range"

Resources