Conditional looping in an Applicative Functor - parsing

Suppose that Parser x is a parser that parses an x. This parser probably possesses a many combinator, that parses zero or more occurrences of something (stopping when the item parser fails).
I can see how one might implement that if Parser forms a monad. I can't figure out how to do it if Parser is only an Applicative Functor. There doesn't seem to be any way to check the previous result and decide what to do next (precisely the notion that monads add). What am I missing?

The Alternative type class provides the many combinator:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
many :: f a -> f [a]
some :: f a -> f [a]
some = some'
many = many'
many' a = some' a <|> pure []
some' a = (:) <$> a <*> many' a
The many a combinator means “zero or more” a.
The some a combinator means “one or more” a.
Hence:
The some a combinator returns a list of one a followed by many a (i.e. 1 + (0 or more)).
The many a combinator returns either some a or an empty list (i.e. (1 or more) | 0).
The many combinator depends upon the (<|>) operator which can be viewed as the default operator in languages like JavaScript. For example, consider the Alternative instance of Maybe:
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
Essentially the (<|>) should return the left hand side value if it's truthy. Otherwise it should return the right hand side value.
A Parser is a data structure which is defined similarly to Maybe (the idea of applicative lexer combinators and parser combinators is essentially the same):
data Lexer a = Fail | Ok (Maybe a) (Vec (Lexer a))
If parsing fails, the Fail value is returned. Otherwise an Ok value is returned. Since Fail <|> pure [] is pure [], this is how the many combinator knows when to stop and return an empty list.

It can't be done just by using what is provided by Applicative. But Alternative has a function that gives you power beyond Applicative:
(<|>) :: f a -> f a -> f a
This function lets you "combine" two Alternatives without any restriction whatsoever on the a. But how? Something intrinsic to the particular functor f must give you a means to do that.
Typically, Alternatives require some notion of failure or emptiness. Like for parsers, where (<|>) means "try to parse this, if it fails, try this other thing". But this "dependence on a previous value" is hidden in the machinery implementing (<|>). It is not available to the external interface, so to speak.
From (<|>), one can implement a zero-or-one combinator:
optional :: Alternative f => f a -> f (Maybe a)
optional v = Just <$> v <|> pure Nothing
The definitions of some an many are similar but they require mutually recursive functions.
Notice that there are Applicatives that aren't Alternatives. You can't make the Identity functor an Alternative, for example. How would you implement empty?

many is a class method of the Alternative class (link) which suggests that an general applicative functor does not always have a many implementation.

Related

Why use lambda instead of pattern matching?

I am watching a tutorial on parsers in haskell https://www.youtube.com/watch?v=9FGThag0Fqs. The lecture starts with defining some really basic parsers. These are to be used together to create more complicated parsers later. One of the basic parsers is item. This is used to extract a character from the string we are parsing.
All parsers have the following type:
type Parser a = String -> [(a, String)]
The parser item is defined like this:
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
I am not so used to this syntax, so it looks strange to me. I would have written it:
item' :: Parser Char
item' [] = []
item' (x:xs) = [(x,xs)]
Testing it in ghci indicates that they are equal:
*Main> item ""
[]
*Main> item "abc"
[('a',"bc")]
*Main> item' ""
[]
*Main> item' "abc"
[('a',"bc")]
The lecturer makes a short comment about thinking it looks clearer, but I disagree. So my questions are:
Are they indeed completely identical?
Why is the lambda version clearer?
I believe this comes from the common practice of writing
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = someResult
where we have exactly n function arrows in the type, and exactly n arguments in the left hand side of the equation. This makes it easy to relate types and formal parameters.
If Result is a type alias for a function, then we may write
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn y = something
or
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = \y -> something
The latter follows the convention above: n arrows, n variables in the left hand side. Also, on the right hand side we have something of type Result, making it easier to spot. The former instead does not, and one might miss the extra argument when reading the code quickly.
Further, this style makes it easy to convert Result to a newtype instead of a type alias:
newtype Result = R (... -> ...)
f :: Type1 -> ... -> Typen -> Result
f x1 ... xn = R $ \y -> something
The posted item :: Parser Char code is an instance of this style when n=0.
Why you should avoid equational function definitions (by Roman Cheplyaka):
http://ro-che.info/articles/2014-05-09-clauses
Major Points from the above link:
DRY: Function and argument names are repeated --> harder to refactor
Clearer shows which arguments the function decides upon
Easy to add pragmas (e.g. for profiling)
Syntactically closer to lower level code
This doesn't explain the lambda though..
I think they are absolutely equal. The lambda-style definition puts a name item to an anonymous lambda function which does pattern matching inside. The pattern-matching-style definition defines it directly. But in the end both are functions that do pattern matching. I think it's a matter of personal taste.
Also, the lambda-style definition could be considered to be in pointfree style, i.e. a function defined without explicitly writing down its arguments actually it is not very much pointfree since the argument is still written (but in a different place), but in this case you don't get anything with this.
Here is another possible definion, somewhere in between of those two:
item :: Parser Char
item inp = case inp of
[] -> []
(x:xs) -> [(x, xs)]
It's essentially identical to the lambda-style, but not pointfree.

Implement `Applicative Parser`'s Apply Function

From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.
Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.

How to parse arbitrary lists with Haskell parsers?

Is it possible to use one of the parsing libraries (e.g. Parsec) for parsing something different than a String? And how would I do this?
For the sake of simplicity, let's assume the input is a list of ints [Int]. The task could be
drop leading zeros
parse the rest into the pattern (S+L+)*, where S is a number less than 10, and L is a number larger or equal to ten.
return a list of tuples (Int,Int), where fst is the product of the S and snd is the product of the L integers
It would be great if someone could show how to write such a parser (or something similar).
Yes, as user5402 points out, Parsec can parse any instance of Stream, including arbitrary lists. As there are no predefined token parsers (as there are for text) you have to roll your own, (myToken below) using e.g. tokenPrim
The only thing I find a bit awkward is the handling of "source positions". SourcePos is an abstract type (rather than a type class) and forces me to use its "filename/line/column" format, which feels a bit unnatural here.
Anyway, here is the code (without the skipping of leading zeroes, for brevity)
import Text.Parsec
myToken :: (Show a) => (a -> Bool) -> Parsec [a] () a
myToken test = tokenPrim show incPos $ justIf test where
incPos pos _ _ = incSourceColumn pos 1
justIf test x = if (test x) then Just x else Nothing
small = myToken (< 10)
large = myToken (>= 10)
smallLargePattern = do
smallints <- many1 small
largeints <- many1 large
let prod = foldl1 (*)
return (prod smallints, prod largeints)
myIntListParser :: Parsec [Int] () [(Int,Int)]
myIntListParser = many smallLargePattern
testMe :: [Int] -> [(Int, Int)]
testMe xs = case parse myIntListParser "your list" xs of
Left err -> error $ show err
Right result -> result
Trying it all out:
*Main> testMe [1,2,55,33,3,5,99]
[(2,1815),(15,99)]
*Main> testMe [1,2,55,33,3,5,99,1]
*** Exception: "your list" (line 1, column 9):
unexpected end of input
Note the awkward line/column format in the error message
Of course one could write a function sanitiseSourcePos :: SourcePos -> MyListPosition
There is very likely a way to get Parsec to use [a] as the stream type, but the idea behind parser combinators is actually very simple, and it's not very difficult to roll your own library.
A very accessible resource I would recommend is Monadic Parsing in Haskell by Graham Hutton and Erik Meijer.
Indeed, right now Erik Meijer is teaching an intro Haskell/functional programming course on edx.org (link) and Lecture 7 is all about functional parsers. As he states in the intro to the lecture:
"... No one can follow the path towards mastering functional programming without writing their own parser combinator library. We start by explaining what parsers are and how they can naturally be viewed as side-effecting functions. Next we define a number of basic parsers and higher-order functions for combining parsers. ..."

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

Making attoparsec parsers recursive

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Resources