Haskell Parser Seperator - parsing

I am using Parsec to write a parser for a logfile. Every line of that logfile follows a common structure A:B:C:D with the components A, B, C and D following simple rules. I've already written parsers for each of the components and I would like to combine them into a single parser. My current approach works, but I feel there has to be a nicer solution. One immediate drawback is that it would not scale very well for logfiles with more than 4 components.
parser :: (a -> b -> c -> d -> e) -> Parser a -> Parser b -> Parser c -> Parser d -> Parser e
parser f pa pb pc pd = f <$> pa <* (char ':') <*> pb <* (char ':') <*> pc <* (char ':') <*> pd
I searched for a fitting parser combinator, but the only combinator coming close is sepBy, which does not work for this use case. Any help is appreciated!

I think the best option is to introduce your own operator, e.g.:
infixl 4 <:>
p <:> q = p <* char ':' <*> q
Then you don't need to define a separate parse function, it is just as easy to just write the implementation:
myParser = f <$> pa <:> pb <:> pc <:> pd
This is easily extended:
myParser2 = g <$> pa <:> pb <:> pc <:> pd <:> pe

Related

Chain two parsers in Haskell (Parsec)

Parsec provides an operator to choose between two parsers:
(<|>)
:: Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
-> Text.Parsec.Prim.ParsecT s u m a
Is there a similar function to chain two parsers? I didn't find one with the same signature using Hoogle.
As an example, let's say I want to parse any word optionally followed by a single digit. My first idea was to use >> but it doesn't seem to work.
parser = many1 letter >> optional (fmap pure digit)
I used fmap pure in order to convert the digit to an actual string and thus match the parsed type of many1 letter. I don't know if it is useful.
Try this:
parser = (++) <$> many1 letter <*> option "" (fmap pure digit)
This is equivalent to:
parser = pure (++) <*> many1 letter <*> option "" (fmap pure digit)
option [] (fmap pure digit) return empty string if the parser digit have failed and a string from one digital char otherwise.
You can also use do-notation for more readable code:
parser = do
s1 <- many1 letter
s2 <- option "" (fmap pure digit)
return (s1 ++ s2)

Haskell - intersperse a parser with another one

I have two parsers parser1 :: Parser a and parser2 :: Parser a.
I would like now to parse a list of as interspersing them with parser2
The desired signature is something like
interspersedParser :: Parser b -> Parser a -> Parser [a]
For example, if Parser a parses the 'a' character and Parser b parser the 'b' character, then the interspersedParser should parse
""
"a"
"aba"
"ababa"
...
I'm using megaparsec. Is there already some combinator which behaves like this, which I'm currently not able to find?
In parsec there is a sepBy parser which does that. The same parser seems to be available in megaparsec as well: https://hackage.haskell.org/package/megaparsec-4.4.0/docs/Text-Megaparsec-Combinator.html
Sure, you can use sepBy, but isn't this just:
interspersedParser sepP thingP = (:) <$> thingP <*> many (sepP *> thingP)
EDIT: Oh, this requires at least one thing to be there. You also wanted empty, so just stick a <|> pure [] on the end.
In fact, this is basically how sepBy1 (a variant of sepBy that requires at least one) is implemented:
-- | #sepBy p sep# parses /zero/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
--
-- > commaSep p = p `sepBy` comma
sepBy :: Alternative m => m a -> m sep -> m [a]
sepBy p sep = sepBy1 p sep <|> pure []
{-# INLINE sepBy #-}
-- | #sepBy1 p sep# parses /one/ or more occurrences of #p#, separated
-- by #sep#. Returns a list of values returned by #p#.
sepBy1 :: Alternative m => m a -> m sep -> m [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
{-# INLINE sepBy1 #-}

Applicative parser stuck in infinite loop

I'm trying to implement my own Applicative parser, here's the code I use:
{-# LANGUAGE ApplicativeDo, LambdaCase #-}
module Parser where
-- Implementation of an Applicative Parser
import Data.Char
import Control.Applicative (some, many, empty, (<*>), (<$>), (<|>), Alternative)
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
-- fmap :: (a -> b) -> (Parser a -> Parser b)
fmap f (Parser p) = Parser (\s -> [(f a, s') | (a,s') <- p s])
instance Applicative Parser where
-- pure :: a -> Parser a
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s ->
[(f a, s'') | (f, s') <- pf s, (a, s'') <- p s']
instance Alternative Parser where
-- empty :: Parser a
-- <|> :: Parser a -> Parser a -> Parser a
empty = Parser $ \_s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of [] -> p2 s
xs -> xs
char :: Char -> Parser Char
char c = Parser $ \case (c':cs) | c == c' -> [(c,cs)] ; _ -> []
main = print $ runParser (some $ char 'A') "AAA"
When I run it, it gets stuck and never returns. After digging into the problem I pinpointed the root cause to be my implementation of the <|> method. If I use the following implementation then everything goes as expected:
instance Alternative Parser where
empty = Parser $ \_s -> []
p1 <|> p2 = Parser $ \s ->
case runParser p1 s of [] -> runParser p2 s
xs -> xs
These two implementations are, in my understanding, quite equivalent. What I guess is that this may have something to do with Haskell's lazy evaluation scheme. Can someone explain what's going on?
Fact "star": in your implementation of (<*>):
Parser p1 <*> Parser p2 = ...
...we must compute enough to know that both arguments are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equation.
Fact "pipe strict": in this implementation:
Parser p1 <|> Parser p2 = ...
...we must compute enough to know that both parsers are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equals sign.
Fact "pipe lazy": in this implementation:
p1 <|> p2 = Parser $ ...
...we may proceed to the right-hand side of the equals sign without doing any computation on p1 or p2.
This is important, because:
some v = some_v where
some_v = pure (:) <*> v <*> (some_v <|> pure [])
Let's take your first implementation, the one about which we know the "pipe strict" fact. We want to know if some_v is an application of Parser to something. Thanks to fact "star", we must therefore know whether pure (:), v, and some_v <|> pure [] are applications of Parser to something. To know this last one, by fact "pipe strict", we must know whether some_v and pure [] are applications of Parser to something. Whoops! We just showed that to know whether some_v is an application of Parser to something, we need to know whether some_v is an application of Parser to something -- an infinite loop!
On the other hand, with your second implementation, to check whether some_v is a Parser _, we still must check pure (:), v, and some_v <|> pure [], but thanks to fact "pipe lazy", that's all we need to check -- we can be confident that some_v <|> pure [] is a Parser _ without first checking recursively that some_v and pure [] are.
(And next, you will learn about newtype -- and be confused yet again when changing from data to newtype makes both implementation work!)

How do I implement an Applicative instance for a parser without assuming Monad?

I can't figure out how to implement an Applicative instance for this parser:
newtype Parser m s a = Parser { getParser :: [s] -> m ([s], a) }
without assuming Monad m. I expected to only have to assume Applicative m, since the Functor instance only has to assume Functor m. I finally ended up with:
instance Functor m => Functor (Parser m s) where
fmap f (Parser g) = Parser (fmap (fmap f) . g)
instance Monad m => Applicative (Parser m s) where
pure a = Parser (\xs -> pure (xs, a))
Parser f <*> Parser x = Parser h
where
h xs = f xs >>= \(ys, f') ->
x ys >>= \(zs, x') ->
pure (zs, f' x')
How do I do this? I tried substituting in for >>= by hand, but always wound up getting stuck trying to reduce a join -- which would also require Monad.
I also consulted Parsec, but even that wasn't much help:
instance Applicative.Applicative (ParsecT s u m) where
pure = return
(<*>) = ap
My reasons for asking this question are purely self-educational.
It's not possible. Look at the inside of your newtype:
getParser :: [s] -> m ([s], a)
Presumably, you want to pass [s] to the input of y in x <*> y. This is exactly the difference between Monad m and Applicative m:
In Monad you can use the output of one computation as the input to another.
In Applicative, you cannot.
It's possible if you do a funny trick:
Parser x <*> Parser y = Parser $
\s -> (\(_, xv) (s', yv) -> (s', xv yv)) <$> x s <*> y s
However, this is almost certainly not the definition that you want, since it parses x and y in parallel.
Fixes
Your ParserT can be Applicative quite easily:
newtype ParserT m s a = ParserT { runParser :: [s] -> m ([s], a) }
-- or, equvalently
newtype ParserT m s a = ParserT (StateT [s] m a)
instance Monad m => Applicative (ParserT m s) where
...
Note that ParserT m s is not an instance of Monad as long as you don't define the Monad instance.
You can move the leftover characters outside the parser:
newtype ParserT m s a = ParserT { runParser :: [s] -> ([s], m a) }
instance Applicative m => Applicative (ParserT m s) where
ParserT x <*> ParserT y = ParserT $ \s ->
let (s', x') = x s
(s'', y') = y s'
in x' <*> y'
...
Full marks for aiming to use Applicative as much as possible - it's much cleaner.
Headline: Your parser can stay Applicative, but your collection of possible parses need to be stored in a Monad. Internal structure: uses a monad. External structure: is applicative.
You're using m ([s],a) to represent a bunch of possible parses. When you parse the next input, you want it to depend on what's already been parsed, but you're using m because there's potentially less than or more than one possible parse; you want to do \([s],a) -> ... and work with that to make a new m ([s],a). That process is called binding and uses >>= or equivalent, so your container is definitely a Monad, no escape.
It's not all that bad using a monad for your container - it's just a container you're keeping some stuff in after all. There's a difference between using a monad internally and being a monad. Your parsers can be applicative whilst using a monad inside.
See What are the benefits of applicative parsing over monadic parsing?.
If your parsers are applicative, they're simpler, so in theory you can do some optimisation when you combine them, by keeping static information about what they do instead of keeping their implementation. For example,
string "Hello World!" <|> string "Hello Mum!"
== (++) <$> string "Hello " <*> (string "World" <|> string "Mum!")
The second version is better than the first because it does no backtracking.
If you do a lot of this, it's like when a regular expression is compiled before it's run, creating a graph (finite state automaton) and simplifying it as much as possible and eliminating a whole load of inefficient backtracking.

Partitioning different types of terms during parsing

I have two parsers for different types of terms.
a :: Parser A
b :: Parser B
I have a data type representing sequences of these terms.
data C = C [A] [B]
If my input is a sequence of mixed terms, what’s a good way of writing c :: Parser C to separate the As from the Bs, preserving their order? For example, given these definitions:
data A = A Char
data B = B Char
a = A <$> oneOf "Aa"
b = B <$> oneOf "Bb"
"abAbBBA" would parse to the sequences aAA and bbBB. I have a feeling I need to use StateT, but am unsure of the specifics and just need a push in the right direction.
A simple solution is to first parse it to a list of Either A B and then use partitionEithers to split this into two lists which you then apply the C constructor to.
c :: Parser C
c = uncurry C . partitionEithers <$> many ((Left <$> a) <|> (Right <$> b))
To solve your problem I'd use partitionEithers from Data.Either, the code is unchecked but it shouldn't be far off...
c :: Parser C
c = (post . partitionEithers ) <$> many1 aORb
where
post (as,bs) = C as bs
aORb :: Parser (Either A B)
aORb = (Left <$> a) <|> (Right <$> b)
Edit -
Snap!

Resources