I just started exploring the possibilities of data types à la carte in combination with indexed types. My current experiment is a bit too large to include here, but can be found here. My example is mixing together an expression from different ingredients (arithmetic, functions, ...). The goal is to enforce only well-typed expressions. That is why an index is added to the expressions (the Sort type).
I can build expressions like:
-- define expressions over variables and arithmetic (+, *, numeric constants)
type Lia = IFix (VarF :+: ArithmeticF)
-- expression of integer type/sort
t :: Lia IntegralSort
t = var "c" .+. cnst 1
This is all good as long as I construct only fixed (static) expressions.
Is there a way to read an expression from string/other representation (that obviously has to encode the sort) and produce a dynamic value that gets represented by these functors?
For example, I would like to read ((c : Int) + (1 : Int)) and represent it somehow with VarF and ArithmeticF. Here I realize I cannot obtain a value of static type Lia IntegralSort. But suppose I have in addition:
data EqualityF a where
Equals :: forall s. a s -> a s -> EqualityF a BoolSort
I could expect there being a function that can read String into Maybe (IFix (EqualityF :+: VarF :+: ...)). Such a function would attempt to build representations for the LHS and RHS and if the sorts matched it could produce a result of statically known type IFix (EqualityF :+: ...) BoolSort. The problem is that the representation of LHS (and RHS) has no fixed static sort. Is what I am trying to do impossible with this representation I chose?
(.=.) :: EqualityF :<: f => IFix f s -> IFix f s -> IFix f BoolSort
(.=.) a b = inject (Equals a b)
You can use a GADT to hide the sort, allowing you to return values of sorts depending on the input. Pattern matching then allows you to recover the sort.
data Expr (f :: (Sort -> *) -> (Sort -> *)) where
BoolExpr :: IFix f BoolSort -> Expr f
IntExpr :: IFix f IntegralSort -> Expr f
Here is a simplistic parser of postfix expressions involving + and =.
parse :: (EqualityF :<: f, ArithmeticF :<: f) => String -> [Expr f] -> Maybe (Expr f)
parse (c : s) stack | isDigit c =
parse s (IntExpr (cnst (digitToInt c)) : stack)
parse ('+' : s) (IntExpr e1 : IntExpr e2 : stack) =
parse s (IntExpr (e1 .+. e2) : stack)
parse ('=' : s) (IntExpr e1 : IntExpr e2 : stack) =
parse s (BoolExpr (e1 .=. e2) : stack)
parse ('=' : s) (BoolExpr e1 : BoolExpr e2 : stack) =
parse s (BoolExpr (e1 .=. e2) : stack)
parse [] [e] = Just e
parse _ _ = Nothing
You might not like the duplicate cases for =. A more general framework is Typeable, allowing you to just test for the type equalities you need.
data SomeExpr (f :: (Sort -> *) -> Sort -> *) where
SomeExpr :: Typeable s => IFix f s -> SomeExpr f
parseSome :: forall f. (EqualityF :<: f, ArithmeticF :<: f) => String -> [SomeExpr f] -> Maybe (Expr f)
parseSome (c : s) stack | isDigit c =
parseSome s (SomeExpr (cnst (digitToInt c)) : stack)
parseSome ('+' : s) (SomeExpr e1 : SomeExpr e2 : stack) = do
e1 <- gcast e1
e2 <- gcast e2
parseSome s (SomeExpr (e1 .+. e2) : stack)
parseSome ('=' : s) (SomeExpr (e1 :: IFix f s1) : SomeExpr (e2 :: IFix f s2) : stack) = do
Refl <- eqT :: Maybe (s1 :~: s2)
parseSome s (SomeExpr (e1 .=. e2) : stack)
parseSome [] [e] = Just e
parseSome _ _ = Nothing
To parse sorts, you want to track them at the type level. Again, use an existential type.
data SomeSort where
SomeSort :: Typeable (s :: Sort) => proxy s -> SomeSort
You can construct the sort of arrays this way:
-- \i e -> array i e
arraySort :: SomeSort -> SomeSort -> SomeSort
arraySort (SomeSort (Proxy :: Proxy i)) (SomeSort (Proxy :: Proxy e)) =
SomeSort (Proxy :: Proxy (ArraySort i e))
A potential problem with Typeable here is that it only allows you to test equality of types, when you may want only to check the head constructor: you can't ask "is this type an ArraySort?", but only "is this type equal to ArraySort IntSort BoolSort?" or some other full type.
In that case you need a GADT that reflects the structure of a sort.
-- "Singleton type"
data SSort (s :: Sort) where
SIntSort :: SSort IntSort
SBoolSort :: SSort BoolSort
SArraySort :: SSort i -> SSort e -> SSort (ArraySort i e)
data SomeSort where
SomeSort :: SSort s -> SomeSort
array :: SomeSort -> SomeSort -> SomeSort
array (SomeSort i) (SomeSort e) = SomeSort (SArraySort i e)
The singleton package provides various facilities for defining and working with these singleton types, though it may be overkill for your use case.
My problem is how to combine the recursive, F-algebra-style recursive type definitions, with monadic/applicative-style parsers, in way that would scale to a realistic programming language.
I have just started with the Expr definition below:
data ExprF a = Plus a a |
Val Integer deriving (Functor,Show)
data Rec f = In (f (Rec f))
type Expr = Rec ExprF
and I am trying to combine it with a parser which uses anamorphisms:
ana :: Functor f => (a -> f a) -> a -> Rec f
ana psi x = In $ fmap (ana psi) (psi x)
parser = ana psi
where psi :: String -> ExprF String
psi = ???
as far as I could understand, in my example, psi should either parse just an integer, or it should decide that the string is a <expr> + <expr> and then (by recursively calling fmap (ana psi)), it should parse the left-hand side and the right-hand side expressions.
However, (monadic/applicative) parsers don't work like that:
they first attempt parsing the left-hand expression,
the +,
and the right-hand expression
One solution that I see, is to change the type definition for Plus a a to Plus Integer a, such that it reflects the parsing process, however this doesn't seem like the best avenue.
Any suggestions (or reading directions) would be welcome!
If you need a monadic parser, you need a monad in your unfold:
anaM :: (Traversable f, Monad m) => (a -> m (f a)) -> a -> m (Rec f)
anaM psiM x = In <$> (psiM x >>= traverse (anaM psiM))
Then you can write something that parses just one level of an ExprF like this:
parseNum :: Parser Integer
parseNum = -- ...
char :: Char -> Parser Char
char c = -- ...
parseExprF :: Maybe Integer -> Parser (ExprF (Maybe Integer))
parseExprF (Just n) = pure (Val n)
parseExprF Nothing = do
n <- parseNum
<|> (Plus (Just n) Nothing <$ char '+')
<|> (pure (Val n))
Given that, you now have your recursive Expr parser:
parseExpr :: Parser Expr
parseExpr = anaM parseExprF Nothing
You will need to have instances of Foldable and Traversable for ExprF, of course, but the compiler can write these for you and they are not themselves recursive.
I'm trying to understand how this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
Is equivalent to this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = do
c <- anyChar
if pred c then return c else empty
This is a snippet from some lecture notes on Haskell parsing, which I'm trying to understand:
import Control.Applicative
import Data.Char
import Data.Functor
import Data.List
newtype Parser a = PsrOf (String -> Maybe (String, a))
-- Function from input string to:
-- * Nothing, if failure (syntax error);
-- * Just (unconsumed input, answer), if success.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) = p
-- Monadic Parsing in Haskell uses [] instead of Maybe to support ambiguous
-- grammars and multiple answers.
-- | Use a parser on an input string.
runParser :: Parser a -> String -> Maybe a
runParser (PsrOf p) inp = case p inp of
Nothing -> Nothing
Just (_, a) -> Just a
-- OR: fmap (\(_,a) -> a) (p inp)
-- | Read a character and return. Failure if input is empty.
anyChar :: Parser Char
anyChar = PsrOf p
p "" = Nothing
p (c:cs) = Just (cs, c)
-- | Read a character and check against the given character.
char :: Char -> Parser Char
-- char wanted = PsrOf p
-- where
-- p (c:cs) | c == wanted = Just (cs, c)
-- p _ = Nothing
char wanted = satisfy (\c -> c == wanted) -- (== wanted)
-- | Read a character and check against the given predicate.
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
-- Could also be:
-- satisfy pred = do
-- c <- anyChar
-- if pred c then return c else empty
instance Monad Parser where
-- return :: a -> Parser a
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
PsrOf p1 >>= k = PsrOf q
q inp = case p1 inp of
Nothing -> Nothing
Just (rest, a) -> dePsr (k a) rest
I understand everything up until the last bit of the Monad definition, specifically I don't understand how the following line returns something of type Parser b as is required by the (>>=) definition:
Just (rest, a) -> dePsr (k a) rest
It's difficult for me grasp what the Monad definition means without an example. Thankfully, we have one in the alternate version of the satisfy function, which uses do-notation (which of course means the Monad is being called). I really don't understand do-notation yet, so here's the desugared version of satisfy:
satisfy pred = do
anyChar >>= (c ->
if pred c then return c else empty)
So based on the first line of our (>>=)definition, which is
PsrOf p1 >>= k = PsrOf q
We have anyChar as our PsrOf p1 and (c -> if pred c then return c else empty) as our k. What I don't get is how in dePsr (k a) rest that (k a) returns a Parser (at least it shold, otherwise calling dePsr on it wouldn't make sense). This is made more confusing by the presence of rest. Even if (k a) returned a Parser, calling dePsr would extract the underlying function from the returned Parser and pass rest to it as an input. This is definitely doesn't return something of type Parser b as required by the definition of (>>=). Clearly I'm misunderstanding something somewhere.
Ok, Maybe this will help. Let's start by puting some points back into dePsr.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) rest = p rest
And let's also write out return: (NB I'm putting in all the points for clarity)
return :: a -> Parser a
return a = PsrOf (\rest -> Just (rest, a))
And now from the Just branch of the (>>=) definition
Just (rest, a) -> dePsr (k a) rest
Let's make sure we agree on what every thing is:
rest the string remaining unparsed after p1 is applied
a the result of applying p1
k :: a -> Parser b takes the result of the previous parser and makes a new parser
dePsr unwraps a Parser a back into a function `String -> Maybe (String, a)
Remember we will wrap this back into a parser again at the top of the function: PsrOf q
So in English bind (>>=) take a parser in a and a function from a to a parser in b and returns a parser in b. The resulting parser is made by wrapping q :: String -> Maybe (String, b) in the Parser constructor PsrOf. Then q, the combined parser, take a String called inp and applies the function p1 :: String -> Maybe (String,a) that we got from pattern matching against the first parser, and pattern matches on the result. For an error we propagate Nothing (easy). If the first parser had a result we have tow pieces of information, the still unparsed string called rest and the result a. We give a to k, the second parser combinator, and get a Parser b which we need to unwrap with dePsr to get a function (String -> Maybe (String,b) back. That function can be applied to rest for the final result of the combined parsers.
I think the hardest part about reading this is that sometimes we curry the parser function which obscures what is actually happening.
Ok for the satisfy example
satisfy pred
= anyChar >>= (c -> if pred c then return c else empty)
empty comes from the alternative instance and is PsrOf (const Nothing) so a parser that always fails.
Lets look at only the successful branches. By substitution of only the successful part:
PsrOf (\(c:cs) ->Just (cs, c)) >>= (\c -> PsrOf (\rest -> Just (rest, c)))
So in the bind (>>=) definition
p1 = \(c:cs -> Just (cs, c))
k = (\c -> PsrOf (\rest -> Just (rest, c)))
q inp = let Just (rest,a) = p1 inp in dePsr (k a) rest again only successful branch
Then q becomes
q inp =
let Just (rest, a) = (\(c:cs) -> Just (cs, c)) inp
in dePsr (\c -> PsrOf (\rest -> Just (rest, c))) a rest
Doing a little β-reduction
q inp =
let (c:cs) = inp
rest = cs
a = c
in dePsr (PsdOf (\rest -> Just (rest, a))) rest -- dePsr . PsrOf = id
Finally cleaning up some more
q (c:cs) = Just (cs, c)
So if pred is successful we reduce satisfy back to exactly anyChar which is exactly what we expect, and exactly what we find in the first example of the question. I will leave it as and exersize to the reader (read: I'm lazy) to prove that if either inp = "" or pred c = False that the outcome is Nothing as in the first satisfy example.
NOTE: If you are doing anything other than a class assignment, you will save yourself hours of pain and frustration by starting with error handling from the beginning make your parser String -> Either String (String,a) it is easy to make the error type more general later, but a PITA to change everything from Maybe to Either.
Question: "[C]ould you explain how you arrived at return a = PsrOf (\rest -> Just (rest, a)) from return = pure after you put "points" back into return?
Answer: First off, it is pretty unfortunate to give the Monad instance definition without the Functor and Applicative definitions. The pure and return functions must be identical (It is part of the Monad Laws), and they would be called the same thing except Monad far predates Applicative in Haskell history. In point of fact, I don't "know" what pure looks like, but I know what it has to be because it is the only possible definition. (If you want to understand the the proof of that statement ask, I have read the papers, and I know the results, but I'm not into typed lambda calculus quite enough to be confident in reproducing the results.)
return must wrap a value in the context without altering the context.
return :: Monad m => a -> m a
return :: a -> Parser a -- for our Monad
return :: a -> PsrOf(\str -> Maybe (rest, value)) -- substituting the constructor (PSUDO CODE)
A Parser is a function that takes a string to be parsed and returns Just the value along with any unparsed portion of the original string or Nothing on failure, all wrapped in the constructorPsrOf. The context is the string to be parsed, so we cannot change that. The value is of course what was passed toreturn`. The parser always succeeds so we must return Just a value.
return a = PsrOf (\rest -> Just (rest, a))
rest is the context and it is passed through unaltered.
a is the value we put into the Monad context.
For completeness here is also the only reasonable definition of fmap from Functor.
fmap :: Functor f => (a->b) -> f a -> f b
fmap :: (a -> b) -> Parser a -> Parser b -- for Parser Monad
fmap f (PsrOf p) = PsrOf q
where q inp = case p inp of
Nothing -> Nothing
Just (rest, a) -> Just (rest, f a)
-- better but less instructive definition of q
-- q = fmap (\(rest,a) -> (rest, f a)) . p
Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.
Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?
For example, the following expression should be parsed into the tree
(a + b) + c + d * e + f
Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.
This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:
module Lib where
import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
data Expr =
Identifier String
| Multiply Expr Expr
| Add Expr Expr
instance Show Expr where
show (Identifier s) = s
show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell
-- Infix parser.
infix_ operator func =
Expr.Infix (reserved operator >> return func) Expr.AssocLeft
parser =
Expr.buildExpressionParser table term <?> "expression"
table = [[infix_ "*" Multiply], [infix_ "+" Add]]
term =
parens parser
<|> (Identifier <$> identifier)
<?> "term"
Running this in GHCi:
λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)
There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:
data Expr' =
Identifier' String
| Add' [Expr']
| Multiply' [Expr']
deriving (Show)
collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e#(Add l r) f =
collect l f ++ collect r f
collect e#(Multiply l r) f =
collect l f ++ collect r f
isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False
isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False
optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e#(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e#(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))
I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.
I really hate asking this kind of question but I'm at the end of my wits here. I am writing an incremental parser but for some reason, just cannot figure out how to implement functor instance for it. Here's the code dump:
Input Data Type
Input is data type yielded by parser to the coroutine. It contains the current list of input chars being operated on by coroutine and end of line condition
data Input a = S [a] Bool deriving (Show)
instance Functor Input where
fmap g (S as x) = S (g <$> as) x
Output Data Type
Output is data type yielded by coroutine to Parser. It is either a Failed message, Done [b], or Partial ([a] -> Output a b), where [a] is the current buffer passed back to the parser
data Output a b = Fail String | Done [b] | Partial ([a] -> Output a b)
instance Functor (Output a) where
fmap _ (Fail s) = Fail s
fmap g (Done bs) = Done $ g <$> bs
fmap g (Partial f) = Partial $ \as -> g <$> f as
The Parser
The parser takes [a] and yields a buffer [a] to coroutine, which yields back Output a b
data ParserI a b = PP { runPi :: [a] -> (Input a -> Output a b) -> Output a b }
Functor Implementation
It seems like all I have to do is fmap the function g onto the coroutine, like follows:
instance Functor (ParserI a) where
fmap g p = PP $ \as k -> runPi p as (\xs -> fmap g $ k xs)
But it does not type check:
Couldn't match type `a1' with `b'
`a1' is a rigid type variable bound by
the type signature for
fmap :: (a1 -> b) -> ParserI a a1 -> ParserI a b
at Tests.hs:723:9
`b' is a rigid type variable bound by
the type signature for
fmap :: (a1 -> b) -> ParserI a a1 -> ParserI a b
at Tests.hs:723:9
Expected type: ParserI a b
Actual type: ParserI a a1
As Philip JF declared, it's not possible to have an instance Functor (ParserI a). The proof goes by variance of functors—any (mathematical) functor must, for each of its arguments, be either covariant or contravariant. Normal Haskell Functors are always covariant which is why
fmap :: (a -> b) -> (f a -> f b)`
Haskell Contravariant functors have the similar
contramap :: (b -> a) -> (f a -> f b)`
In your case, the b index in ParserI a b would have to be both covariant and contravariant. The quick way of figuring this out is to relate covariant positions to + and contravariant to - and build from some basic rules.
Covariant positions are function results, contravariant are function inputs. So a type mapping like type Func1 a b c = (a, b) -> c has a ~ -, b ~ -, and c ~ +. If you have functions in output positions, you multiply all of the argument variances by +1. If you have functions in input positions you multiply all the variances by -1. Thus
type Func2 a b c = a -> (b -> c)
has the same variances as Func1 but
type Func3 a b c = (a -> b) -> c
has a ~ 1, b ~ -1, and c ~ 1. Using these rules you can pretty quickly see that Output has variances like Output - + and then ParserI uses Output in both negative and positive positions, thus it can't be a straight up Functor.
But there are generalizations like Contravariant. The particular generalization of interest is Profunctor (or Difunctors which you see sometimes) which goes like so
class Profunctor f where
promap :: (a' -> a) -> (b -> b') -> (f a b -> f a' b')
the quintessential example of which being (->)
instance Profunctor (->) where
promap f g orig = g . orig . f
i.e. it "extends" the function both after (like a usual Functor) and before. Profunctors f are thus always mathematical functors of arity 2 with variance signature f - +.
So, by generalizing your ParserI slightly, letting there be an extra parameter to split the ouput types in half, we can make it a Profunctor.
data ParserIC a b b' = PP { runPi :: [a] -> (Input a -> Output a b) -> Output a b' }
instance Profunctor (ParserIC a) where
promap before after (PP pi) =
PP $ \as k -> fmap after $ pi as (fmap before . k)
and then you can wrap it up
type ParserI a b = ParserIC a b b
and provide a slightly less convenient mapping function over b
mapPi :: (c -> b) -> (b -> c) -> ParserI a b -> ParserI a c
mapPi = promap
which really drives home the burden of having the variances go both ways---you need to have bidirectional maps!
tl;dr, How do I implement parsers whose backtracking can be restricted, where the parsers are monad transformer stacks?
I haven't found any papers, blogs, or example implementations of this approach; it seems the typical approach to restricting backtracking is a datatype with additional constructors, or the Parsec approach where backtracking is off by default.
My current implementation -- using a commit combinator, see below -- is wrong; I'm not sure about the types, whether it belongs in a type class, and my instances are less generic than it feels like they should be.
Can anyone describe how to do this cleanly, or point me to resources?
I've added my current code below; sorry for the post being so long!
The stack:
Either e
The intent is that backtracking operates in the middle layer -- a Nothing or an empty list wouldn't necessarily yield an error, it'd just mean that a different branch should be tried -- whereas the bottom layer is for errors (with some contextual information) that immediately abort the parsing.
{-# LANGUAGE NoMonomorphismRestriction, FunctionalDependencies,
FlexibleInstances, UndecidableInstances #-}
import Control.Monad.Trans.State (StateT(..))
import Control.Monad.State.Class (MonadState(..))
import Control.Monad.Trans.Maybe (MaybeT(..))
import Control.Monad.Trans.List (ListT(..))
import Control.Monad (MonadPlus(..), guard)
type Parser e t mm a = StateT [t] (mm (Either e)) a
newtype DParser e t a =
DParser {getDParser :: Parser e t MaybeT a}
instance Monad (DParser e t) where
return = DParser . return
(DParser d) >>= f = DParser (d >>= (getDParser . f))
instance MonadPlus (DParser e t) where
mzero = DParser (StateT (const (MaybeT (Right Nothing))))
mplus = undefined -- will worry about later
instance MonadState [t] (DParser e t) where
get = DParser get
put = DParser . put
A couple of parsing classes:
class (Monad m) => MonadParser t m n | m -> t, m -> n where
item :: m t
parse :: m a -> [t] -> n (a, [t])
class (Monad m, MonadParser t m n) => CommitParser t m n where
commit :: m a -> m a
Their instances:
instance MonadParser t (DParser e t) (MaybeT (Either e)) where
item =
get >>= \xs -> case xs of
(y:ys) -> put ys >> return y;
[] -> mzero;
parse = runStateT . getDParser
instance CommitParser t (DParser [t] t) (MaybeT (Either [t])) where
commit p =
DParser (
StateT (\ts -> MaybeT $ case runMaybeT (parse p ts) of
Left e -> Left e;
Right Nothing -> Left ts;
Right (Just x) -> Right (Just x);))
And a couple more combinators:
satisfy f =
item >>= \x ->
guard (f x) >>
return x
literal x = satisfy (== x)
Then these parsers:
ab = literal 'a' >> literal 'b'
ab' = literal 'a' >> commit (literal 'b')
give these results:
> myParse ab "abcd"
Right (Just ('b',"cd")) -- succeeds
> myParse ab' "abcd"
Right (Just ('b',"cd")) -- 'commit' doesn't affect success
> myParse ab "acd"
Right Nothing -- <== failure but not an error
> myParse ab' "acd"
Left "cd" -- <== error b/c of 'commit'
The answer appears to be in the MonadOr type class (which unfortunately for me is not part of the standard libraries):
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
satisfying Monoid and Left Catch:
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a