Implementing a lexer using the Free Monad - parsing

I am thinking about a use case of the free monad which would be a simple lexing DSL. So far I came up with some primitive operations:
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
FAIL :: LexF r
...
instance Functor LexF where
...
type Lex = Free LexF
The problem I encounter is that I would like to have a CHOICE primitive that would describe an operation of trying to execute one parser and in case of failure fallback to another. Something like CHOICE :: LexF r -> LexF r -> (r -> r) -> LexF r...
...and here the stairs begin. Since r is preset at contravariant position, it is impossible (is it?) to create a valid Functor instance for Op. I came up with some other idea, which was to generalize over the type of alternative lexers, so CHOICE :: LexF a -> LexF a -> (a -> r) -> LexF r – now it works as a Functor, though the problem is with thawing it into Free, as I would normally do it with liftF:
choice :: OpF a -> OpF a -> OpF a
choice c1 c2 = liftF $ CHOICE _ _ id -- how to fill the holes :: Op a ?
I am really running out of any ideas. This of course generalizes to nearly all other combinators, I just find CHOICE a good minimal case. How to tackle it? I am okay to hear that this example is totally broken and it just won't work with Free as I would like to. But therefore, does it even make sense to write lexers/parsers in this manner?

As a general rule when working with free monads, you don't want to introduce primitives for "monadic control". For example, a SEQUENCE primitive would be ill-advised, because the free monad itself provides sequencing. Likewise, a CHOICE primitive is ill-advised because this should be provided by a free
MonadPlus.
Now, there is no free MonadPlus in modern versions of free because equivalent functionality is provided by a free monad transformer over a list base monad, namely FreeT f []. So, what you probably want is to define:
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
deriving instance Functor LexF
type Lex = FreeT LexF []
pop :: (Char -> a) -> Lex a
pop f = liftF $ POP f
peek :: (Char -> a) -> Lex a
peek f = liftF $ PEEK f
but no FAIL or CHOICE primitives.
If you were to define fail and choice combinators, they would be defined by means of the list base monad using transformer magic:
fail :: Lex a
fail = empty
choice :: Lex a -> Lex a -> Lex a
choice = (<|>)
though there's no actual reason to define these.
SPOILERS follow... Anyway, you can now write things like:
anyChar :: Lex Char
anyChar = pop id
char :: Char -> Lex Char
char c = do
c' <- anyChar
guard $ c == c'
return c'
a_or_b :: Lex Char
a_or_b = char 'a' <|> char 'b'
With an interpreter for your monad primitives, in this case intrepreting them to the StateT String [] AKA String -> [(a,String)] monad:
type Parser = StateT String []
runLex :: Lex a -> Parser a
runLex = iterTM go
where go :: LexF (Parser a) -> Parser a
go (POP f) = StateT pop' >>= f
where pop' (c:cs) = [(c,cs)]
pop' _ = []
go (PEEK f) = StateT peek' >>= f
where peek' (c:cs) = [(c,c:cs)]
peek' _ = []
parse :: Lex a -> String -> [(a, String)]
parse = runStateT . runLex
you can then:
main :: IO ()
main = do
let test = parse a_or_b
print $ test "abc"
print $ test "bca"
print $ test "cde"
The full example:
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE GADTs #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad.State
import Control.Applicative
import Control.Monad.Trans.Free
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
deriving instance Functor LexF
type Lex = FreeT LexF []
pop :: (Char -> a) -> Lex a
pop f = liftF $ POP f
peek :: (Char -> a) -> Lex a
peek f = liftF $ PEEK f
anyChar :: Lex Char
anyChar = pop id
char :: Char -> Lex Char
char c = do
c' <- anyChar
guard $ c == c'
return c'
a_or_b :: Lex Char
a_or_b = char 'a' <|> char 'b'
type Parser = StateT String []
runLex :: Lex a -> Parser a
runLex = iterTM go
where go :: LexF (Parser a) -> Parser a
go (POP f) = StateT pop' >>= f
where pop' (c:cs) = [(c,cs)]
pop' _ = []
go (PEEK f) = StateT peek' >>= f
where peek' (c:cs) = [(c,c:cs)]
peek' _ = []
parse :: Lex a -> String -> [(a, String)]
parse = runStateT . runLex
main :: IO ()
main = do
let test = parse a_or_b
print $ test "abc"
print $ test "bca"
print $ test "cde"

Related

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Combining parsers in Haskell

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Cascading Parsers in Haskell

My parser type is
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
I have two parsers :
1) a = (satisfy isAlpha) that knows how to match the first alpha numeric character in a string.
Running parse a "k345" gives Just ('k',"345")
2) b = many (satisfy isDigit) that knows how to match any number of digits. Running parse b "1234 abc" gives Just ("1234"," abc")
Now I want to combine those two parsers and match a singe alphanumeric character followed by any number of digits.
I tried:
parse (a *> b) "k1234 7" and got Just ("1234"," 7 "). Looks like the 'k' matched by the first parser a is gone from the output. How do I fix this problem ?
Thanks!
For a toy parser, look the following code:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Parse where
import Data.Char
import Data.List
newtype Parser a = Parser
{ parse :: String -> Maybe (a, String) }
satisfy :: (Char -> Bool) -> Parser Char
satisfy cond = Parser $ \s ->
case s of
"" -> Nothing
(c:cs) -> if cond c then Just (c, cs) else Nothing
many :: Parser a -> Parser [a]
many p = Parser $ \s ->
case parse p s of
Nothing -> Just ([], s)
Just (c, cs) -> let Just (cc, cs') = parse (many p) cs
in Just (c:cc, cs')
string :: String -> Parser String
string str = Parser $ \s -> if isPrefixOf str s
then Just (str, drop (length str) s)
else Nothing
instance Functor Parser where
fmap f (Parser g) = Parser $ \s ->
case g s of
Nothing -> Nothing
Just (r, remain) -> Just (f r, remain)
instance Applicative Parser where
pure a = Parser $ \s -> Just (a, s)
-- (<*>) :: f (a -> b) -> f a -> f b
(Parser f) <*> (Parser g) = Parser $ \s ->
case f s of
Nothing -> Nothing
Just (ab, remain) -> case g remain of
Nothing -> Nothing
Just (r, remain1) -> Just (ab r, remain1)
instance Semigroup a => Semigroup (Parser a) where
(Parser p1) <> (Parser p2) = Parser $ \s ->
case p1 s of
Nothing -> Nothing
Just (r1, s1) -> case p2 s1 of
Nothing -> Nothing
Just (r2, s2) -> Just (r1 <> r2, s2)
instance (Monoid a, Semigroup (Parser a))=> Monoid (Parser a) where
mempty = Parser $ \s -> Just (mempty, s)
mappend = (<>)
a = satisfy isAlpha
b = many (satisfy isDigit)
λ> parse a "k345"
Just ('k',"345")
λ> parse b "12345 abc"
Just ("12345"," abc")
λ> parse (a *> b) "k1234 7"
Just ("1234"," 7")
λ> parse (string "k" <> b) "k1234 7"
Just ("k1234"," 7")
So maybe you should find some tutorials and try to be familiar with Functor, Applicative, and Monad. See, you can implement the instance of a Monoid for your Parser type, and then you can use (<>) to combine your parsed results together.
It looks like this is working fine :
parse (fmap (:) (satisfy isAlpha) <*> many (satisfy isDigit)) "k1234 7"
And gives back what I wanted
Just ("k1234"," 7")

Minimal Purely Applicative Parser

I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).

Purely Applicative Parser using Alternative

In a previous post, a user offered an implementation of a purely applicative parser for Haskell (code originally from here). Below is the partial implementation of that parser:
{-# LANGUAGE Rank2Types #-}
import Control.Applicative (Alternative(..))
import Data.Foldable (asum, traverse_)
The type:
newtype Parser a = Parser {run :: forall f. Alternative f => (Char -> f ()) -> f a}
The instances:
instance Functor Parser where
fmap f (Parser cont) = Parser $ \char -> f <$> cont char
instance Applicative Parser where
pure a = Parser $ \char -> pure a
(Parser contf) <*> (Parser cont) = Parser $ \char -> (contf char) <*> (cont char)
instance Alternative Parser where
empty = Parser $ \char -> empty
(Parser cont) <|> (Parser cont') = Parser $ \char -> (cont char) <|> (cont' char)
some (Parser cont) = Parser $ \char -> some $ cont char
many (Parser cont) = Parser $ \char -> many $ cont char
Some example parsers:
item = Parser $ \char -> asum $ map (\c -> c <$ char c) ['A'..'z']
digit = Parser $ \char -> asum $ map (\c -> c <$ char (head $ show c)) [0..9]
string s = Parser $ \char -> traverse_ char s
Unfortunately, I'm having a hard time trying to understand how I might use this parser implementation. In particular, I do not understand what Char -> f () should/could be and how I could use this to do simple parsing, e.g. to extra a digit out of an input string. I'd like a concrete example if possible. Could someone please shed some light?
In forall f. Alternative f => (Char -> f ()) -> f a, the Char -> f () is something that you are provided with. Your mission, should you choose to accept it, is to then turn that into an f a using only these two bits:
The Char -> f () function (i.e. a way to parse a single character: if the next character matches the argument, the parsing succeeds; otherwise it doesn't.)
The Alternative instance of f
So how would you parse a single digit into an Int? It would have to be of the form
digit :: Parser Int
digit = Parser $ \parseChar -> _
In _, we have to create an f Int using the kit parseChar :: Char -> f () and Alternative f. We know how to parse a single '0' character: parseChar '0' succeds iff the next character is '0'. We can turn it into a value of Int via f's Functor instance, arriving at
digit0 :: Parser Int
digit0 = Parser $ \parseChar -> fmap (const 0) (parseChar '0')
But f is not just Functor, it is also Alternative, so we can write digit in long-form as
digit :: Parser Int
digit = Parser $ \parseChar -> fmap (const 0) (parseChar '0') <|>
fmap (const 1) (parseChar '1') <|>
fmap (const 2) (parseChar '2') <|>
fmap (const 3) (parseChar '3') <|>
fmap (const 4) (parseChar '4') <|>
fmap (const 5) (parseChar '5') <|>
fmap (const 6) (parseChar '6') <|>
fmap (const 7) (parseChar '7') <|>
fmap (const 8) (parseChar '8') <|>
fmap (const 9) (parseChar '9')
And from here, it is merely a matter of pedestrian Haskell programming to cut down on the cruft, arriving at something like
digit :: Parser Int
digit = Parser $ \parseChar -> asum [fmap (const d) (parseChar c) | d <- [0..9], let [c] = show d]
which we can further simplify by noting that fmap (const x) f can be written as x <$ f, giving
digit :: Parser Int
digit = Parser $ \parseChar -> asum [d <$ parseChar c | d <- [0..9], let [c] = show d]
The Char -> f () part represents matching on a single character. Namely, if you do char 'c', it will match on 'c' and fail on everything else.
To use it, you can convert it to, say Parsec:
convert :: Parser a -> Parsec a
convert p = run p anyChar
p is essentially of the type forall f. Alternative f => (Char -> f ()) -> f a, which specializes to (Char -> Parsec ()) -> Parsec a. We pass in anyChar, and it will produce a Parsec a value by using anyChar and any Alternative operations.
Basically, a Parser a it is a function that, given away to match on a single character, and an Alternative instance, it will produce an Alternative value.

Resources