Parse error when using case expression inside guards haskell - parsing

I am new to Haskell and I am having issues with syntax. What I want to do is given data and a tree of this datatype, find the path to the corresponding node in the tree. I believe my logic for the function is correct but I am not sure how to make it valid Haskell. I have tried changing tabs to spaces.
-- | Binary trees with nodes labeled by values of an arbitrary type.
data Tree a
= Node a (Tree a) (Tree a)
| End
deriving (Eq,Show)
-- | One step in a path, indicating whether to follow the left subtree (L)
-- or the right subtree (R).
data Step = L | R
deriving (Eq,Show)
-- | A path is a sequence of steps. Each node in a binary tree can be
-- identified by a path, indicating how to move down the tree starting
-- from the root.
type Path = [Step]
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| case (pathTo a l) of
Just p -> Just [L:p]
Nothing -> case (pathTo a r) of
Just p -> Just [R:p]
Nothing -> Nothing
This is the error:
parse error (possibly incorrect indentation or mismatched brackets)

The underlying problem here is that this does not look like a guard: a guard is an expression with type Bool, this determines if the guard "fires" or not. Here this is likely `otherwise:
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = case (pathTo a l) of
Just p -> Just (L:p)
Nothing -> case (pathTo a r) of
Just p -> Just (R:p)
Nothing -> Nothing
This also revealed some extra mistakes: Just [L:p] is a Maybe [[Step]], you likely wanted to use Just (L:p), the same applies for Just [R:p].
You furthermore do not need to use nested cases, you can work with the Alternative typeclass:
import Control.Applicative((<|>))
pathTo :: Eq a => a -> Tree a -> Maybe Path
pathTo a End = Nothing
pathTo a (Node b l r)
| a == b = Just []
| otherwise = ((L:) <$> pathTo a l) <|> ((R:) <$> pathTo a r)
Here x <|> y will take x if it is a Just …, and y otherwise. We use (L:) <$> … to prepend the list wrapped in the Just data constructor, or return Nothing in case … is Nothing.

Related

Taking two streams and combining them in OCaml

I want to take two streams of integers in increasing order and combine them into one stream that contains no duplicates and should be in increasing order. I have defined the functionality for streams in the following manner:
type 'a susp = Susp of (unit -> 'a)
let force (Susp f) = f()
type 'a str = {hd : 'a ; tl : ('a str) susp }
let merge s1 s2 = (* must implement *)
The first function suspends computation by wrapping a computation within a function, and the second function evaluates the function and provides me with the result of the computation.
I want to emulate the logic of how you go about combining lists, i.e. match on both lists and check which elements are greater, lesser, or equal and then append (cons) the integers such that the resulting list is sorted.
However, I know I cannot just do this with streams of course as I cannot traverse it like a list, so I think I would need to go integer by integer, compare, and then suspend the computation and keep doing this to build the resulting stream.
I am at a bit of a loss how to implement such logic however, assuming it is how I should be going about this, so if somebody could point me in the right direction that would be great.
Thank you!
If the the input sequences are sorted, there is not much difference between merging lists and sequences. Consider the following merge function on lists:
let rec merge s t =
match s, t with
| x :: s , [] | [], x :: s -> x :: s
| [], [] -> s
| x :: s', y :: t' ->
if x < y then
x :: (merge s' t)
else if x = y then
x :: (merge s' t')
else
y :: (merge s t')
This function is only using two properties of lists:
the ability to split the potential first element from the rest of the list
the ability to add an element to the front of the list
This suggests that we could rewrite this function as a functor over the signature
module type seq = sig
type 'a t
(* if the seq is non-empty we split the seq into head and tail *)
val next: 'a t -> ('a * 'a t) option
(* add back to the front *)
val cons: 'a -> 'a t -> 'a t
end
Then if we replace the pattern matching on the list with a call to next, and the cons operation with a call to cons, the previous function is transformed into:
module Merge(Any_seq: seq ) = struct
open Any_seq
let rec merge s t =
match next s, next t with
| Some(x,s), None | None, Some (x,s) ->
cons x s
| None, None -> s
| Some (x,s'), Some (y,t') ->
if x < y then
cons x (merge s' t)
else if x = y then
cons x (merge s' t')
else
cons y (merge s t')
end
Then, with list, our implementation was:
module List_core = struct
type 'a t = 'a list
let cons = List.cons
let next = function
| [] -> None
| a :: q -> Some(a,q)
end
module List_implem = Merge(List_core)
which can be tested with
let test = List_implem.merge [1;5;6] [2;4;9]
Implementing the same function for your stream type is then just a matter of writing a similar Stream_core module for stream.

Parser Error Reporting deriving the right instances

I am trying to build an error reporting parser in haskell. Currently I have been looking at a tutorial and this is what I have so far.
type Position = (Int, Int)
type Err = (String, Position)
newtype Parser1 a = Parser1 {parse1 :: StateT String (StateT Position (MaybeT
(Either Err))) a} deriving (Monad, MonadState String, Applicative, Functor)
runParser :: Parser1 a -> String -> Either Err (Maybe ((a, String), Position))
runParser p ts = runMaybeT $ runStateT (runStateT (parse1 p) ts) (0, 0)
basicItem = Parser1 $ do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty
item = Parser1 $ do
c <- basicItem
pos <- lift get
lift (put (f pos))
return c
f :: Char -> Position -> Position
f d (ln, c) = (ln + 1, 0)
f _ (ln, c) = (ln , c + 1)
This piece of code does not compile, I think it is to do with my item parser and the fact that I am trying to access the inner state namely position. I was wondering how in the deriving clause do I make Haskell derive the instances for both states in my parser type, so then I can access the inner state?
Edit 1:
I initially tried declaring basicItem as:
basicItem :: (MonadState String m, Alternative m) => m t
basicItem = do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty`
However, I kept getting the error:
I was wondering why it cannot deduce context of get from MonadState String m,
when in my deriving clause I have MonadState String.
The error for my initial question is here:

Generating a parser with `inverse`, with constraints on the grammar

I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.
I ended up with a working node2string function to operate on Node (with attributes and children), which I then inversed to obtain a parse function, as exemplified in the article.
The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/> HTML snippet into exactly one Node representation; everything else nondeterministically yielded invalid things like
Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }
and so on.
After some initial naive attempts to fix that from within node2string, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/> indeed were exactly the 2 valid and constructible values of Node that would lead to HTML representations.
I realized I had to constrain Node to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node constructor behind e.g. a mkNode sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).
So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.
data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'
name2string :: Name -> String
name2string (Name s) = map name2char s
-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string
data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }
attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
| otherwise = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
where name' = name2string name
attrs' = (concatMap ((" " ++) . attr2string) attrs)
children' = intercalate "" $ map (node2string) children
inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free
parse :: String -> Node
parse = inverse node2string
This, in fact, works perfectly (in my judgement):
Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])
Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])
(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar] print more nicely)
However, my question is:
is there a way to use something like isAlpha (or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.
In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.
You can achieve the same results using just Char. As you've already pointed out, you can use isAlpha to define name2char as a partial identity. I changed the following lines of your code.
type NameChar = Char
name2char :: NameChar -> Char
name2char c | isAlpha c = c
The two exemplary expressions then evaluate as follows.
test> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])
test> parse "<input type=\"submit\"/>"
(Node (Name "input") [(Attr (Name "type") "submit")] [])
As a side-effect, names with non-alpha characters silently fail with nameFromString.
test> nameFromString "input "
Edit: Since you seem to be a fan of function patterns, you can define generators for Nodes and Attrs and use them in your conversion function.
attr :: Name -> String -> Attr
attr name val
| name `elem` ["type", "src", "alt", "name"] = Attr name val
node :: String -> [Attr] -> [Node] -> Node
node name [] nodes
| name `elem` ["a", "p"] = Node name [] nodes
node name attrPairs#(_:_) nodes
| name `elem` ["img", "input"] = Node name attrPairs nodes
node2string :: Node -> String
node2string (node name attrs children)
| null children = "<" ++ name ++ attrs' ++ "/>"
| otherwise = "<" ++ name ++ attrs' ++ ">"
++ children' ++ "</" ++ name' ++ ">"
where
name' = name
attrs' = concatMap ((" " ++) . attr2string) attrs
children' = intercalate "" $ map (node2string) children
attr2string :: Attr -> String
attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
where
escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name).
Edit2:
Besides the fact that the solution with the isAlpha condition is quite "prettier" than your verbose solution, it is also defined in a declarative way.
Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar data type. The isAlpha condition on the other hand is a good example for a declarative specification of the wanted property.
Does this answer your question? I'm not sure what you are aiming at.

Monadic parsing functional pearl - gluing multiple parsers together

I am working my way through the functional pearl paper Monadic parsing in Haskell (after recommendation at haskellforall.com to read that paper to understand parsing). I wrote an implementation until section 4 on page 3 as below:
newtype Parser a = Parser (String -> [(a,String)])
parse (Parser p) = p
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse p cs])
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
p :: Parser (Char,Char)
p = do { a <- item; item; b <- item; return (a,b)}
According to the paper, p is a parser that consumes three characters, skips middle one, and returns a pair of first and second. What I can't figure out is how the modified input string is passed to 2nd and 3rd definitions of item in p. We are not passing the result of first parser to second parser, and so on (because ;, syntactic sugar for >> is used which discards the result as shown by type signature (>>) :: Monad m => m a -> m b -> m b). I will appreciate explanation of how the modified function is being passed in last two invocations of item in p.
Another thing that confuses me is the handling of cs in item - it doesn't return (head,tail) pair. Shouldn't it be redefined as follow since the item parser consumes one character according to the paper:
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs') -> [(c,cs')]) -- redefinition - use cs' to denote tail
The syntax ; is not always syntactic sugar for >>.
Rather, we have:
do m ; n = m >> n
do x<-m ; n = m >>= \x -> n
(The above translation is simplified, the full gory details can be found in the Haskell Report)
So, your definition for p is equivalent to:
p = item >>= \a -> ( item >> (item >>= \b -> return (a,b) ))
Here, you can see that the first and third items do not have their results discarded (because >>= binds them to a and b respectively), while the middle item does.
Also note that the code
\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)]
is misleading since it is defining variable cs twice: once in the \cs and once in the
pattern (c:cs). It is equivalent to
\cs -> case cs of
"" -> []
(x:xs) -> [(x,xs)]
This clarifies that the final String is the output is not the original cs one, but rather its tail xs.
In a comment, the poster wondered why the three uses of item do not return the same result, i.e., why in return (a,b) the character a is not equal to b. This is due to the >>= monadic operator, which in this Parser monad automatically feeds the output string xs of each item occurence to the next one. Indeed, the whole point of this monad is to help feeding the "leftover" output of each parser as the "to-be-consumed" input in the next one. This has two advantages: it frees the programmer from having to write code to pass this string around, and it ensures that the string is not accidentally "rewound" to a previous state. To illustrate the latter point, here's some wrong code:
let [(c1,s1)] = someParser someInitialString
[(c2,s2)] = anotherParser1 s1
[(c3,s3)] = anotherParser2 s2
[(c4,s4)] = anotherParser3 s3
[(c5,s5)] = anotherParser4 s2 -- Whoops! Should have been s4
in [c1,c2,c3,c4,c5]
In the last step the string, after having been consumed multiple times, is wrongly rolled back to a previous state, as if the parsers anotherParser2 and anotherParser3 did not consume anything at all. This error is prevented by composing parsers through >>= instead.
I'll try shedding some more light regarding >>.
As you see in the other answer, you should desugar the do's into >>= to better understand what's going on.
Let's for example write a parser that parses two chars and returns them.
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
j <- item
return (i,j)
Now, desugar the do syntax:
twoChars :: Parser (Char,Char)
twoChars =
item >>= (\i ->
item >>= (\j ->
return (i,j) ) )
I put brackets for clarity. As you see, the second item receives the result of the first item parser in the anonymous function, with the result bound to i. The >>= function takes a parser, a function, and returns a parser. Best way to understand it would be to plug it into the definition:
f = \i → item »= \j → return (i,j)
twoChars = item >>= f
twoChars = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse item cs])
So we got back a new Parser. Try to imagine what it will do on an input "abc". cs is bound to "abc", and the item Parser is used to get back [('a',"bc")]. Now, we apply f to 'a', to get back the new parser:
item >>= \j -> return ('a',j)
This parser will be passed the rest of our string left to process ("bc"), and it will use the item parser to get out the b when the \j above is bound to b. We then get a return ('a','b') statement, which puts ('a','b') into a parser that just return ('a','b').
I hope this clears up how the information flow happens. Now, suppose that you want to ignore a character. You could do it like this.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \j ->
item >>= \k ->
return (i,k)
It's ok that the j is bound to 'b' for the example "abc", you never use it. We can so replace j by _.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \_ ->
item >>= \k ->
return (i,k)
But we also know that >> :: m a -> m b -> m b can be defined as:
p >> q = p >>= \_ -> q
So we are left with
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>
item >>= \k ->
return (i,k)
Finally, you can sugar this back into do. The application of >> simply sugars into a single-line statement with no bounding. It results in:
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
item
j <- item
return (i,j)
Hope this cleared some things up.
The more uniform translation of your
p3 = do { a <- item; item; b <- item; return (a,b)}
-- do { a <- item; z <- item; b <- item; return (a,b)} -- z is ignored
is
p3 = item >>= (\a ->
item >>= (\z ->
item >>= (\b ->
return (a,b)))) -- z is unused
(the key observation here is that the functions are nested). Which means that
-- parse (return a) cs = [(a,cs)]
-- parse (p >>= f) cs = [r | (a,cs1) <- parse p cs, -- concat
-- r <- parse (f a) cs1] ) -- inlined !
parse p3 cs
= [ r | (a,cs1) <- parse item cs,
r <- [ r | (z,cs2) <- parse item cs1,
r <- [ r | (b,cs3) <- parse item cs2,
r <- -- parse (return (a,b)) cs3
[((a,b),cs3)]]]] -- z is unused
= [ ((a,b),cs3) | (a,cs1) <- parse item cs,
(_,cs2) <- parse item cs1,
(b,cs3) <- parse item cs2]
So you see, "the input string" does change: first it's cs, then cs1, then cs2.
That is the simple real computation behind all the Parser tags and do syntax. It's all just about the chaining of inputs and outputs in the nested loops, in the end:
parse p3 cs =
for each (a,cs1) in (parse item cs):
for each (z,cs2) in (parse item cs1):
for each (b,cs3) in (parse item cs2):
yield ((a,b),cs3)

How to restrict backtracking in a monad transformer parser combinator

tl;dr, How do I implement parsers whose backtracking can be restricted, where the parsers are monad transformer stacks?
I haven't found any papers, blogs, or example implementations of this approach; it seems the typical approach to restricting backtracking is a datatype with additional constructors, or the Parsec approach where backtracking is off by default.
My current implementation -- using a commit combinator, see below -- is wrong; I'm not sure about the types, whether it belongs in a type class, and my instances are less generic than it feels like they should be.
Can anyone describe how to do this cleanly, or point me to resources?
I've added my current code below; sorry for the post being so long!
The stack:
StateT
MaybeT/ListT
Either e
The intent is that backtracking operates in the middle layer -- a Nothing or an empty list wouldn't necessarily yield an error, it'd just mean that a different branch should be tried -- whereas the bottom layer is for errors (with some contextual information) that immediately abort the parsing.
{-# LANGUAGE NoMonomorphismRestriction, FunctionalDependencies,
FlexibleInstances, UndecidableInstances #-}
import Control.Monad.Trans.State (StateT(..))
import Control.Monad.State.Class (MonadState(..))
import Control.Monad.Trans.Maybe (MaybeT(..))
import Control.Monad.Trans.List (ListT(..))
import Control.Monad (MonadPlus(..), guard)
type Parser e t mm a = StateT [t] (mm (Either e)) a
newtype DParser e t a =
DParser {getDParser :: Parser e t MaybeT a}
instance Monad (DParser e t) where
return = DParser . return
(DParser d) >>= f = DParser (d >>= (getDParser . f))
instance MonadPlus (DParser e t) where
mzero = DParser (StateT (const (MaybeT (Right Nothing))))
mplus = undefined -- will worry about later
instance MonadState [t] (DParser e t) where
get = DParser get
put = DParser . put
A couple of parsing classes:
class (Monad m) => MonadParser t m n | m -> t, m -> n where
item :: m t
parse :: m a -> [t] -> n (a, [t])
class (Monad m, MonadParser t m n) => CommitParser t m n where
commit :: m a -> m a
Their instances:
instance MonadParser t (DParser e t) (MaybeT (Either e)) where
item =
get >>= \xs -> case xs of
(y:ys) -> put ys >> return y;
[] -> mzero;
parse = runStateT . getDParser
instance CommitParser t (DParser [t] t) (MaybeT (Either [t])) where
commit p =
DParser (
StateT (\ts -> MaybeT $ case runMaybeT (parse p ts) of
Left e -> Left e;
Right Nothing -> Left ts;
Right (Just x) -> Right (Just x);))
And a couple more combinators:
satisfy f =
item >>= \x ->
guard (f x) >>
return x
literal x = satisfy (== x)
Then these parsers:
ab = literal 'a' >> literal 'b'
ab' = literal 'a' >> commit (literal 'b')
give these results:
> myParse ab "abcd"
Right (Just ('b',"cd")) -- succeeds
> myParse ab' "abcd"
Right (Just ('b',"cd")) -- 'commit' doesn't affect success
> myParse ab "acd"
Right Nothing -- <== failure but not an error
> myParse ab' "acd"
Left "cd" -- <== error b/c of 'commit'
The answer appears to be in the MonadOr type class (which unfortunately for me is not part of the standard libraries):
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
satisfying Monoid and Left Catch:
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a

Resources