Implement `Applicative Parser`'s Apply Function - parsing

From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.

Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.

Related

Implementation differences between parser combinators and packrat algorithm

In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
[] -> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat(). Its implementation is almost a drop-in replacement for pyparsing's _parse method (renamed to _parseNoCache), with a _parseCache method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat(), or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.

Conditional looping in an Applicative Functor

Suppose that Parser x is a parser that parses an x. This parser probably possesses a many combinator, that parses zero or more occurrences of something (stopping when the item parser fails).
I can see how one might implement that if Parser forms a monad. I can't figure out how to do it if Parser is only an Applicative Functor. There doesn't seem to be any way to check the previous result and decide what to do next (precisely the notion that monads add). What am I missing?
The Alternative type class provides the many combinator:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
many :: f a -> f [a]
some :: f a -> f [a]
some = some'
many = many'
many' a = some' a <|> pure []
some' a = (:) <$> a <*> many' a
The many a combinator means “zero or more” a.
The some a combinator means “one or more” a.
Hence:
The some a combinator returns a list of one a followed by many a (i.e. 1 + (0 or more)).
The many a combinator returns either some a or an empty list (i.e. (1 or more) | 0).
The many combinator depends upon the (<|>) operator which can be viewed as the default operator in languages like JavaScript. For example, consider the Alternative instance of Maybe:
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
Essentially the (<|>) should return the left hand side value if it's truthy. Otherwise it should return the right hand side value.
A Parser is a data structure which is defined similarly to Maybe (the idea of applicative lexer combinators and parser combinators is essentially the same):
data Lexer a = Fail | Ok (Maybe a) (Vec (Lexer a))
If parsing fails, the Fail value is returned. Otherwise an Ok value is returned. Since Fail <|> pure [] is pure [], this is how the many combinator knows when to stop and return an empty list.
It can't be done just by using what is provided by Applicative. But Alternative has a function that gives you power beyond Applicative:
(<|>) :: f a -> f a -> f a
This function lets you "combine" two Alternatives without any restriction whatsoever on the a. But how? Something intrinsic to the particular functor f must give you a means to do that.
Typically, Alternatives require some notion of failure or emptiness. Like for parsers, where (<|>) means "try to parse this, if it fails, try this other thing". But this "dependence on a previous value" is hidden in the machinery implementing (<|>). It is not available to the external interface, so to speak.
From (<|>), one can implement a zero-or-one combinator:
optional :: Alternative f => f a -> f (Maybe a)
optional v = Just <$> v <|> pure Nothing
The definitions of some an many are similar but they require mutually recursive functions.
Notice that there are Applicatives that aren't Alternatives. You can't make the Identity functor an Alternative, for example. How would you implement empty?
many is a class method of the Alternative class (link) which suggests that an general applicative functor does not always have a many implementation.

How to parse arbitrary lists with Haskell parsers?

Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?
For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be
drop leading zeros
parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers
It would be great if someone could show how to write such a parser (or something similar).
Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim
The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.
Anyway, here is the code (without the skipping of leading zeroes, for brevity)
import Text.Parsec
myToken :: (Show a) => (a -> Bool) -> Parsec [a] () a
myToken test = tokenPrim show incPos $ justIf test where
incPos pos _ _ = incSourceColumn pos 1
justIf test x = if (test x) then Just x else Nothing
small = myToken (< 10)
large = myToken (>= 10)
smallLargePattern = do
smallints <- many1 small
largeints <- many1 large
let prod = foldl1 (*)
return (prod smallints, prod largeints)
myIntListParser :: Parsec [Int] () [(Int,Int)]
myIntListParser = many smallLargePattern
testMe :: [Int] -> [(Int, Int)]
testMe xs = case parse myIntListParser "your list" xs of
Left err -> error $ show err
Right result -> result
Trying it all out:
*Main> testMe [1,2,55,33,3,5,99]
[(2,1815),(15,99)]
*Main> testMe [1,2,55,33,3,5,99,1]
*** Exception: "your list" (line 1, column 9):
unexpected end of input
Note the awkward line/column format in the error message
Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition
There is very likely a way to get Parsec to use [a] as the stream type, but the idea behind parser combinators is actually very simple, and it's not very difficult to roll your own library.
A very accessible resource I would recommend is Monadic Parsing in Haskell by Graham Hutton and Erik Meijer.
Indeed, right now Erik Meijer is teaching an intro Haskell/functional programming course on edx.org (link) and Lecture 7 is all about functional parsers. As he states in the intro to the lecture:
"... No one can follow the path towards mastering functional programming without writing their own parser combinator library. We start by explaining what parsers are and how they can naturally be viewed as side-effecting functions. Next we define a number of basic parsers and higher-order functions for combining parsers. ..."

Creating a parser combinator of type Parser a -> Parser b -> Parser (Either a b)

I want to parse some text in which certain fields have structure most of the time but occasionally (due to special casing, typos etc) this structure is missing.
E.g. Regular case is Cost: 5, but occasionally it will read Cost: 5m or Cost: 3 + 1 per ally, or some other random stuff.
In the case of the normal parser (p) not working, I'd like to fallback to a parser which just takes the whole line as a string.
To this end, I'd like to create a combinator of type Parser a -> Parser b -> Either a b. However, I cannot work out how to inspect the results of attempting to see if the first parser succeeds or not, without doing something like case parse p "" txt of ....
I can't see a build in combinator, but I'm sure there's some easy way to solve this that I'm missing
I think you want something like this
eitherParse :: Parser a -> Parser b -> Parser (Either a b)
eitherParse a b = fmap Left (try a) <|> fmap Right b
The try is just to ensure that if a consumes some input and then fails, you'll backtrack properly. Then you can just use the normal methods for running a parser to yield Either ParseError (Either a b)
Which is quite easy to transform into your Either a b
case parse p "" str of
Right (Left a) -> useA a
Right (Right b) -> useB b
Left err -> handleParserError err
Try this: (<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
As a rule you could use it this way:
try p <|> q

Making attoparsec parsers recursive

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Resources