Chain two parsers in Haskell (Parsec) - parsing

Parsec provides an operator to choose between two parsers:
(<|>)
:: Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
Is there a similar function to chain two parsers? I didn't find one with the same signature using Hoogle.
As an example, let's say I want to parse any word optionally followed by a single digit. My first idea was to use >> but it doesn't seem to work.
parser = many1 letter >> optional (fmap pure digit)
I used fmap pure in order to convert the digit to an actual string and thus match the parsed type of many1 letter. I don't know if it is useful.

Try this:
parser = (++) <$> many1 letter <*> option "" (fmap pure digit)
This is equivalent to:
parser = pure (++) <*> many1 letter <*> option "" (fmap pure digit)
option [] (fmap pure digit) return empty string if the parser digit have failed and a string from one digital char otherwise.
You can also use do-notation for more readable code:
parser = do
s1 <- many1 letter
s2 <- option "" (fmap pure digit)
return (s1 ++ s2)

Related

Haskell - intersperse a parser with another one

I have two parsers parser1 :: Parser a and parser2 :: Parser a.
I would like now to parse a list of as interspersing them with parser2
The desired signature is something like
interspersedParser :: Parser b -> Parser a -> Parser [a]
For example, if Parser a parses the 'a' character and Parser b parser the 'b' character, then the interspersedParser should parse
""
"a"
"aba"
"ababa"
...
I'm using megaparsec. Is there already some combinator which behaves like this, which I'm currently not able to find?
In parsec there is a sepBy parser which does that. The same parser seems to be available in megaparsec as well: https://hackage.haskell.org/package/megaparsec-4.4.0/docs/Text-Megaparsec-Combinator.html
Sure, you can use sepBy, but isn't this just:
interspersedParser sepP thingP = (:) <$> thingP <*> many (sepP *> thingP)
EDIT: Oh, this requires at least one thing to be there. You also wanted empty, so just stick a <|> pure [] on the end.
In fact, this is basically how sepBy1 (a variant of sepBy that requires at least one) is implemented:
-- | #sepBy p sep# parses /zero/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
--
-- > commaSep p = p `sepBy` comma
sepBy :: Alternative m => m a -> m sep -> m [a]
sepBy p sep = sepBy1 p sep <|> pure []
{-# INLINE sepBy #-}
-- | #sepBy1 p sep# parses /one/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
sepBy1 :: Alternative m => m a -> m sep -> m [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
{-# INLINE sepBy1 #-}

How to do a backtrack search with parser combinators?

I have a list of parsers e.g. [string "a",string "ab"] that are "overlapping". I can change neither the parsers themselves nor their order.
With these parsers I want to parse a sequence of tokens that would each be exact matches for one of the parsers e.g. "aaaab", "ab", "abab" but not "abb"
Without parsers I would just implement a dept first search, but I would like to solve this with parsers.
I get about this far:
import Control.Applicative
import Text.Trifecta
parsers = [string "a",string "ab"]
parseString (many (choice parsers) <* eof) mempty "aab"
This fails because it will parse "a" both times, and not backtrack because choice doesn't do that. And further, string "a" has succeeded both times so the consumed input probably can't be retrieved anymore.
How can implement a parser that can backtrack and produce a list of parse results e.g. Success ["a","ab"]?
If I require the input to have the tokens separated, I still can't make it work:
This works:
parseString (try (string "a" <* eof) <|> (string "ab" <*eof)) mempty "ab"
But this does not:
parseString (try (foldl1 (<|>) $ map (\x -> x <* eof) parsers)) mempty "ab"
The try level is performed too high. You should perform it on the individual parsers. For example:
parseString (foldl1 (<|>) $ map (\x -> try (x <* eof)) parsers) mempty "ab"
In the original parser you wrote:
parseString ((try (string "a" <* eof)) <|> (string "ab" <*eof)) mempty "ab"
Notice that the left operand of <|> is try (string "a" <* eof) with try included.
whereas in the one you performed with foldl1, you wrote:
parseString (try ((string "a" <* eof) <|> (string "ab" <*eof))) mempty "ab"
So here is the try not part of the left operand. As a result, if the first parser fails, the "cursor" will not return to the point where it made the decision to try the first operand.
We can improve the above, by making use of asum :: (Foldable t, Alternative f) -> t (f a) -> f a:
import Data.Foldable(asum)
parseString (asum (map (\x -> try (x <* eof)) parsers)) mempty "ab"

How to write a parsec parser for a list of interspersed elements?

Let's say the input looks something like foo#1 bar baz-3.qux [...]. I want to write a parser that only consumes the input up until the first space before the [, which means foo#1 bar baz-3.qux (without the trailing space).
How should I approach this using parsec?
I can imagine something like
foo = many1 $ letter <|> digit <|> oneOf " #-."
but this consumes even the space at the end, which I'd like to avoid. What is a general approach to parsing a list of things interspersed with another thing? (Imagine it's not just a space, but something that would also need to be parsed).
P.S: I'm looking for the most general solution possible, not a clever hack that solves this particular example.
I think what you're looking for is exactly notFollowedBy. Something like
foo = many1 $ letter
<|> digit
<|> oneOf "#-."
<|> (try $ char ' ' >> notFollowedBy (char '[') >> return ' ')
You can abstract out the pattern to get the general function of course:
endedBy :: (Show y) => Parser x -> Parser x -> Parser y -> Parser [x]
endedBy p final terminal = many1 $ p <|> t where
t = try $ do
x <- final
notFollowedBy terminal
return x
foo' = endedBy (letter <|> digit <|> oneOf "#-.") (char ' ') (char '[')

Haskell Parsec Parser for Encountering [...]

I'm attempting to write a parser in Haskell using Parsec. Currently I have a program that can parse
test x [1,2,3] end
The code that does this is given as follows
testParser = do {
reserved "test";
v <- identifier;
symbol "[";
l <- sepBy natural commaSep;
symbol "]";
p <- pParser;
return $ Test v (List l) p
} <?> "end"
where commaSep is defined as
commaSep = skipMany1 (space <|> char ',')
Now is there some way for me to parse a similar statement, specifically:
test x [1...3] end
Being new to Haskell, and Parsec for that matter, I'm sure there's some nice concise way of doing this that I'm just not aware of. Any help would be appreciated.
Thanks again.
I'll be using some functions from Control.Applicative like (*>). These functions are useful if you want to avoid the monadic interface of Parsec and prefer the applicative interface, because the parsers become easier to read that way in my opinion.
If you aren't familiar with the basic applicative functions, leave a comment and I'll explain them. You can look them up on Hoogle if you are unsure.
As I've understood your problem, you want a parser for some data structure like this:
data Test = Test String Numbers
data Numbers = List [Int] | Range Int Int
A parser that can parse such a data structure would look like this (I've not compiled the code, but it should work):
-- parses "test <identifier> [<numbers>] end"
testParser :: Parser Test
testParser =
Test <$> reserved "test" *> identifier
<*> symbol "[" *> numbersParser <* symbol "]"
<* reserved "end"
<?> "test"
numbersParser :: Parser Numbers
numbersParser = try listParser <|> rangeParser
-- parses "<natural>, <natural>, <natural>" etc
listParser :: Parser Numbers
listParser =
List <$> sepBy natural (symbol ",")
<?> "list"
-- parses "<natural> ... <natural>"
rangeParser :: Parser Numbers
rangeParser =
Range <$> natural <* symbol "..."
<*> natural
<?> "range"

Parsing in Haskell for a simple interpreter

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.
Here is the piece of code which I am having problems with
data IntExp
= IVar Var
| ICon Int
| Add IntExp IntExp
deriving (Read, Show)
whitespace = many1 (char ' ')
parseICon :: Parser IntExp
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
parseIVar :: Parser IntExp
parseIVar =
do x <- many (letter)
prime <- string "'" <|> string ""
return (IVar (x ++ prime))
parseIntExp :: Parser IntExp
parseIntExp =
do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
return x
parseAdd :: Parser IntExp
parseAdd =
do x <- parseIntExp
whitespace
string "+"
whitespace
y <- parseIntExp
return (Add x y)
runP :: Show a => Parser a -> String -> IO ()
runP p input
= case parse p "" input of
Left err ->
do putStr "parse error at "
print err
Right x -> print x
The language is slightly more complex, but this is enough to show my problem.
So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully
runP parseAdd "5 + 5"
which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg
runP parseAdd "n + m"
This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.
So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.
I hope I've given enough info about my problem and I was clear enough.
Thanks for any help you can give me!
Your problem is actually in parseICon:
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.
And while I'm at it, since you're new to Haskell, here's some unsolicited advice:
Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.
You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.
You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.
It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.
It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.
Also, doing this sort of thing is redundant and unnecessary:
parseICon :: Parser IntExp
parseICon =
do x <- many digit
return (ICon (read x))
When all you're doing is applying a regular function to the result of a parser, you can just use fmap:
parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)
They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.
Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:
whitespace = many1 $ char ' '
parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit
parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName
parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime
parsePrime :: Parser String
parsePrime = option "" $ string "'"
parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd
parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()
parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
I'm also new to Haskell, just wondering:
will parseIntExp ever make it to parseAdd?
It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.
e.g. runP parseIntExp "3 + m"
would try parseICon, and succeed, giving
(ICon 3) instead of (Add (ICon 3) (IVar m))
Sorry if I'm being stupid here, I'm just unsure.

Resources