What do the "some" and "many" functions of Alternative do? [duplicate] - parsing

This question already has answers here:
What are Alternative's "some" and "many" useful for?
(5 answers)
'some' and 'many' functions from the 'Alternative' type class [duplicate]
(2 answers)
Closed 2 years ago.
I'm working through Write You a Haskell, and I'm on the part where he implements "Nanoparsec", a Haskell parser from first principles. I'm getting stuck on the Alternative instance of the parser, specifically the some and many implementations:
instance Alternative Parser where
-- empty = ...
-- (<|>) p q = ...
some v = some_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
many v = many_v
where
many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
I have no idea what these two functions are doing. From what I can see, some takes a parser, applies it, and concatenates the result until the input is used up. But many looks like it does the same thing. What's happening here?

Related

Self made "Many1" parser

This is task from online course. I've been sitting on this for two days. Please give some explanation or hints to solve it.
Here's type
newtype Prs a = Prs { runPrs :: String -> Maybe (a, String) }
I need to implement many1 parser. This is how it should work
> runPrs (many1 $ char 'A') "AAABCDE"
Just ("AAA","BCDE")
> runPrs (many1 $ char 'A') "BCDE"
Nothing
I have parser many implemented like that
many p = (:) <$> p <*> many p <|> pure []
Here's output for previous example.
*Main> test9
Just ("AAA","BCDE")
*Main> test10
Just ("","BCDE")
Note last result, it returns empty string but many1 should return Nothing. I don't know how to change many code to make work like many1. I can't undestand how to stop on first incorrect symbol.
Your many1 will need some way to fail: as you've written it it consumes characters for a while, consing them onto a pending result, until it eventually runs out of matches. This doesn't cover any cases where the parse could fail.
What you've implemented here is, in a way, many0, a parser which consumes 0 or more repetitions of something. Can you think of a way to implement many1 in terms of many0? It will look something like:
Consume one instance of p, without an alternative in case that fails
Consume 0 or more instances of p, returning [] when that fails.
Or in Haskell,
many1 :: Prs a -> Prs [a]
many1 p = (:) <$> p <*> many0 p

Conditional looping in an Applicative Functor

Suppose that Parser x is a parser that parses an x. This parser probably possesses a many combinator, that parses zero or more occurrences of something (stopping when the item parser fails).
I can see how one might implement that if Parser forms a monad. I can't figure out how to do it if Parser is only an Applicative Functor. There doesn't seem to be any way to check the previous result and decide what to do next (precisely the notion that monads add). What am I missing?
The Alternative type class provides the many combinator:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
many :: f a -> f [a]
some :: f a -> f [a]
some = some'
many = many'
many' a = some' a <|> pure []
some' a = (:) <$> a <*> many' a
The many a combinator means “zero or more” a.
The some a combinator means “one or more” a.
Hence:
The some a combinator returns a list of one a followed by many a (i.e. 1 + (0 or more)).
The many a combinator returns either some a or an empty list (i.e. (1 or more) | 0).
The many combinator depends upon the (<|>) operator which can be viewed as the default operator in languages like JavaScript. For example, consider the Alternative instance of Maybe:
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
Essentially the (<|>) should return the left hand side value if it's truthy. Otherwise it should return the right hand side value.
A Parser is a data structure which is defined similarly to Maybe (the idea of applicative lexer combinators and parser combinators is essentially the same):
data Lexer a = Fail | Ok (Maybe a) (Vec (Lexer a))
If parsing fails, the Fail value is returned. Otherwise an Ok value is returned. Since Fail <|> pure [] is pure [], this is how the many combinator knows when to stop and return an empty list.
It can't be done just by using what is provided by Applicative. But Alternative has a function that gives you power beyond Applicative:
(<|>) :: f a -> f a -> f a
This function lets you "combine" two Alternatives without any restriction whatsoever on the a. But how? Something intrinsic to the particular functor f must give you a means to do that.
Typically, Alternatives require some notion of failure or emptiness. Like for parsers, where (<|>) means "try to parse this, if it fails, try this other thing". But this "dependence on a previous value" is hidden in the machinery implementing (<|>). It is not available to the external interface, so to speak.
From (<|>), one can implement a zero-or-one combinator:
optional :: Alternative f => f a -> f (Maybe a)
optional v = Just <$> v <|> pure Nothing
The definitions of some an many are similar but they require mutually recursive functions.
Notice that there are Applicatives that aren't Alternatives. You can't make the Identity functor an Alternative, for example. How would you implement empty?
many is a class method of the Alternative class (link) which suggests that an general applicative functor does not always have a many implementation.

Haskell Parsec Parser for Encountering [...]

I'm attempting to write a parser in Haskell using Parsec. Currently I have a program that can parse
test x [1,2,3] end
The code that does this is given as follows
testParser = do {
reserved "test";
v <- identifier;
symbol "[";
l <- sepBy natural commaSep;
symbol "]";
p <- pParser;
return $ Test v (List l) p
} <?> "end"
where commaSep is defined as
commaSep = skipMany1 (space <|> char ',')
Now is there some way for me to parse a similar statement, specifically:
test x [1...3] end
Being new to Haskell, and Parsec for that matter, I'm sure there's some nice concise way of doing this that I'm just not aware of. Any help would be appreciated.
Thanks again.
I'll be using some functions from Control.Applicative like (*>). These functions are useful if you want to avoid the monadic interface of Parsec and prefer the applicative interface, because the parsers become easier to read that way in my opinion.
If you aren't familiar with the basic applicative functions, leave a comment and I'll explain them. You can look them up on Hoogle if you are unsure.
As I've understood your problem, you want a parser for some data structure like this:
data Test = Test String Numbers
data Numbers = List [Int] | Range Int Int
A parser that can parse such a data structure would look like this (I've not compiled the code, but it should work):
-- parses "test <identifier> [<numbers>] end"
testParser :: Parser Test
testParser =
Test <$> reserved "test" *> identifier
<*> symbol "[" *> numbersParser <* symbol "]"
<* reserved "end"
<?> "test"
numbersParser :: Parser Numbers
numbersParser = try listParser <|> rangeParser
-- parses "<natural>, <natural>, <natural>" etc
listParser :: Parser Numbers
listParser =
List <$> sepBy natural (symbol ",")
<?> "list"
-- parses "<natural> ... <natural>"
rangeParser :: Parser Numbers
rangeParser =
Range <$> natural <* symbol "..."
<*> natural
<?> "range"

Partitioning different types of terms during parsing

I have two parsers for different types of terms.
a :: Parser A
b :: Parser B
I have a data type representing sequences of these terms.
data C = C [A] [B]
If my input is a sequence of mixed terms, what’s a good way of writing c :: Parser C to separate the As from the Bs, preserving their order? For example, given these definitions:
data A = A Char
data B = B Char
a = A <$> oneOf "Aa"
b = B <$> oneOf "Bb"
"abAbBBA" would parse to the sequences aAA and bbBB. I have a feeling I need to use StateT, but am unsure of the specifics and just need a push in the right direction.
A simple solution is to first parse it to a list of Either A B and then use partitionEithers to split this into two lists which you then apply the C constructor to.
c :: Parser C
c = uncurry C . partitionEithers <$> many ((Left <$> a) <|> (Right <$> b))
To solve your problem I'd use partitionEithers from Data.Either, the code is unchecked but it shouldn't be far off...
c :: Parser C
c = (post . partitionEithers ) <$> many1 aORb
where
post (as,bs) = C as bs
aORb :: Parser (Either A B)
aORb = (Left <$> a) <|> (Right <$> b)
Edit -
Snap!

Making attoparsec parsers recursive

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Resources