Haskell - intersperse a parser with another one - parsing

I have two parsers parser1 :: Parser a and parser2 :: Parser a.
I would like now to parse a list of as interspersing them with parser2
The desired signature is something like
interspersedParser :: Parser b -> Parser a -> Parser [a]
For example, if Parser a parses the 'a' character and Parser b parser the 'b' character, then the interspersedParser should parse
""
"a"
"aba"
"ababa"
...
I'm using megaparsec. Is there already some combinator which behaves like this, which I'm currently not able to find?

In parsec there is a sepBy parser which does that. The same parser seems to be available in megaparsec as well: https://hackage.haskell.org/package/megaparsec-4.4.0/docs/Text-Megaparsec-Combinator.html

Sure, you can use sepBy, but isn't this just:
interspersedParser sepP thingP = (:) <$> thingP <*> many (sepP *> thingP)
EDIT: Oh, this requires at least one thing to be there. You also wanted empty, so just stick a <|> pure [] on the end.
In fact, this is basically how sepBy1 (a variant of sepBy that requires at least one) is implemented:
-- | #sepBy p sep# parses /zero/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
--
-- > commaSep p = p `sepBy` comma
sepBy :: Alternative m => m a -> m sep -> m [a]
sepBy p sep = sepBy1 p sep <|> pure []
{-# INLINE sepBy #-}
-- | #sepBy1 p sep# parses /one/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
sepBy1 :: Alternative m => m a -> m sep -> m [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
{-# INLINE sepBy1 #-}

Related

Chain two parsers in Haskell (Parsec)

Parsec provides an operator to choose between two parsers:
(<|>)
:: Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
Is there a similar function to chain two parsers? I didn't find one with the same signature using Hoogle.
As an example, let's say I want to parse any word optionally followed by a single digit. My first idea was to use >> but it doesn't seem to work.
parser = many1 letter >> optional (fmap pure digit)
I used fmap pure in order to convert the digit to an actual string and thus match the parsed type of many1 letter. I don't know if it is useful.
Try this:
parser = (++) <$> many1 letter <*> option "" (fmap pure digit)
This is equivalent to:
parser = pure (++) <*> many1 letter <*> option "" (fmap pure digit)
option [] (fmap pure digit) return empty string if the parser digit have failed and a string from one digital char otherwise.
You can also use do-notation for more readable code:
parser = do
s1 <- many1 letter
s2 <- option "" (fmap pure digit)
return (s1 ++ s2)

Applicative parser stuck in infinite loop

I'm trying to implement my own Applicative parser, here's the code I use:
{-# LANGUAGE ApplicativeDo, LambdaCase #-}
module Parser where
-- Implementation of an Applicative Parser
import Data.Char
import Control.Applicative (some, many, empty, (<*>), (<$>), (<|>), Alternative)
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
-- fmap :: (a -> b) -> (Parser a -> Parser b)
fmap f (Parser p) = Parser (\s -> [(f a, s') | (a,s') <- p s])
instance Applicative Parser where
-- pure :: a -> Parser a
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s ->
[(f a, s'') | (f, s') <- pf s, (a, s'') <- p s']
instance Alternative Parser where
-- empty :: Parser a
-- <|> :: Parser a -> Parser a -> Parser a
empty = Parser $ \_s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of [] -> p2 s
xs -> xs
char :: Char -> Parser Char
char c = Parser $ \case (c':cs) | c == c' -> [(c,cs)] ; _ -> []
main = print $ runParser (some $ char 'A') "AAA"
When I run it, it gets stuck and never returns. After digging into the problem I pinpointed the root cause to be my implementation of the <|> method. If I use the following implementation then everything goes as expected:
instance Alternative Parser where
empty = Parser $ \_s -> []
p1 <|> p2 = Parser $ \s ->
case runParser p1 s of [] -> runParser p2 s
xs -> xs
These two implementations are, in my understanding, quite equivalent. What I guess is that this may have something to do with Haskell's lazy evaluation scheme. Can someone explain what's going on?
Fact "star": in your implementation of (<*>):
Parser p1 <*> Parser p2 = ...
...we must compute enough to know that both arguments are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equation.
Fact "pipe strict": in this implementation:
Parser p1 <|> Parser p2 = ...
...we must compute enough to know that both parsers are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equals sign.
Fact "pipe lazy": in this implementation:
p1 <|> p2 = Parser $ ...
...we may proceed to the right-hand side of the equals sign without doing any computation on p1 or p2.
This is important, because:
some v = some_v where
some_v = pure (:) <*> v <*> (some_v <|> pure [])
Let's take your first implementation, the one about which we know the "pipe strict" fact. We want to know if some_v is an application of Parser to something. Thanks to fact "star", we must therefore know whether pure (:), v, and some_v <|> pure [] are applications of Parser to something. To know this last one, by fact "pipe strict", we must know whether some_v and pure [] are applications of Parser to something. Whoops! We just showed that to know whether some_v is an application of Parser to something, we need to know whether some_v is an application of Parser to something -- an infinite loop!
On the other hand, with your second implementation, to check whether some_v is a Parser _, we still must check pure (:), v, and some_v <|> pure [], but thanks to fact "pipe lazy", that's all we need to check -- we can be confident that some_v <|> pure [] is a Parser _ without first checking recursively that some_v and pure [] are.
(And next, you will learn about newtype -- and be confused yet again when changing from data to newtype makes both implementation work!)

How to do a backtrack search with parser combinators?

I have a list of parsers e.g. [string "a",string "ab"] that are "overlapping". I can change neither the parsers themselves nor their order.
With these parsers I want to parse a sequence of tokens that would each be exact matches for one of the parsers e.g. "aaaab", "ab", "abab" but not "abb"
Without parsers I would just implement a dept first search, but I would like to solve this with parsers.
I get about this far:
import Control.Applicative
import Text.Trifecta
parsers = [string "a",string "ab"]
parseString (many (choice parsers) <* eof) mempty "aab"
This fails because it will parse "a" both times, and not backtrack because choice doesn't do that. And further, string "a" has succeeded both times so the consumed input probably can't be retrieved anymore.
How can implement a parser that can backtrack and produce a list of parse results e.g. Success ["a","ab"]?
If I require the input to have the tokens separated, I still can't make it work:
This works:
parseString (try (string "a" <* eof) <|> (string "ab" <*eof)) mempty "ab"
But this does not:
parseString (try (foldl1 (<|>) $ map (\x -> x <* eof) parsers)) mempty "ab"
The try level is performed too high. You should perform it on the individual parsers. For example:
parseString (foldl1 (<|>) $ map (\x -> try (x <* eof)) parsers) mempty "ab"
In the original parser you wrote:
parseString ((try (string "a" <* eof)) <|> (string "ab" <*eof)) mempty "ab"
Notice that the left operand of <|> is try (string "a" <* eof) with try included.
whereas in the one you performed with foldl1, you wrote:
parseString (try ((string "a" <* eof) <|> (string "ab" <*eof))) mempty "ab"
So here is the try not part of the left operand. As a result, if the first parser fails, the "cursor" will not return to the point where it made the decision to try the first operand.
We can improve the above, by making use of asum :: (Foldable t, Alternative f) -> t (f a) -> f a:
import Data.Foldable(asum)
parseString (asum (map (\x -> try (x <* eof)) parsers)) mempty "ab"

Haskell Parsec Parser for Encountering [...]

I'm attempting to write a parser in Haskell using Parsec. Currently I have a program that can parse
test x [1,2,3] end
The code that does this is given as follows
testParser = do {
reserved "test";
v <- identifier;
symbol "[";
l <- sepBy natural commaSep;
symbol "]";
p <- pParser;
return $ Test v (List l) p
} <?> "end"
where commaSep is defined as
commaSep = skipMany1 (space <|> char ',')
Now is there some way for me to parse a similar statement, specifically:
test x [1...3] end
Being new to Haskell, and Parsec for that matter, I'm sure there's some nice concise way of doing this that I'm just not aware of. Any help would be appreciated.
Thanks again.
I'll be using some functions from Control.Applicative like (*>). These functions are useful if you want to avoid the monadic interface of Parsec and prefer the applicative interface, because the parsers become easier to read that way in my opinion.
If you aren't familiar with the basic applicative functions, leave a comment and I'll explain them. You can look them up on Hoogle if you are unsure.
As I've understood your problem, you want a parser for some data structure like this:
data Test = Test String Numbers
data Numbers = List [Int] | Range Int Int
A parser that can parse such a data structure would look like this (I've not compiled the code, but it should work):
-- parses "test <identifier> [<numbers>] end"
testParser :: Parser Test
testParser =
Test <$> reserved "test" *> identifier
<*> symbol "[" *> numbersParser <* symbol "]"
<* reserved "end"
<?> "test"
numbersParser :: Parser Numbers
numbersParser = try listParser <|> rangeParser
-- parses "<natural>, <natural>, <natural>" etc
listParser :: Parser Numbers
listParser =
List <$> sepBy natural (symbol ",")
<?> "list"
-- parses "<natural> ... <natural>"
rangeParser :: Parser Numbers
rangeParser =
Range <$> natural <* symbol "..."
<*> natural
<?> "range"

Using `opt` combinator in uu-parsinglib

I am writing a parser for a simple text template language for my project, and I am completely stuck on opt combinator in uu-parsinglib (version 2.7.3.2 in case that matters). Any ideas on how to use it properly?
Here is a very simplified example that shows my predicament.
{-# LANGUAGE FlexibleContexts #-}
import Text.ParserCombinators.UU hiding (pEnd)
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
pIdentifier :: Parser String
pIdentifier = pMany pLetter
pIfClause :: Parser ((String, String), String, Maybe (String, String), String)
pIfClause = (,,,) <$> pIf <*> pIdentifier <*> pOptionalElse <*> pEnd
pIf :: Parser (String, String)
pIf = pBraces ((,) <$> pToken "if " <*> pIdentifier)
pOptionalElse :: Parser (Maybe (String, String))
pOptionalElse = (((\x y -> Just (x, y)) <$> pElse <*> pIdentifier) `opt` Nothing)
pElse :: Parser String
pElse = pBraces (pToken "else")
pEnd :: Parser String
pEnd = pBraces (pToken "end")
main :: IO ()
main = do
putStrLn $ show $ runParser "works" pIfClause "{if abc}def{else}ghi{end}"
putStrLn $ show $ runParser "doesn't work" pIfClause "{if abc}def{end}"
The first string parses properly but the second fails with error:
main: Failed parsing 'doesn't work' :
Expected at position LineColPos 0 12 12 expecting one of [Whitespace, "else"] at LineColPos 0 12 12 :
v
{if abc}def{end}
^
The documentation for opt says:
If p can be recognized, the return value of p is used. Otherwise, the value v is used. Note that opt by default is greedy.
What greedy means is explained in the documentation for <<|>:
<<|> is the greedy version of <|>. If its left hand side parser can make any progress then it commits to that alternative.
In your case, the first argument to opt does recognize part of the input, because else and end both start with e. Thus, it commits to pElse, which fails and makes the whole parse fail.
An easy way to solve this is to use ... <|> pure Nothing, as the documentation suggests.

Resources