Invertible State monad (and parsers) - parsing

Good day, ladies and gentlemen!
I'm constantly writing parsers and codecs. Implementing both parsers and printers seems to be massive code duplication. I wonder whether it is possible to invert a stateful computation, given it is isomorphic by nature.
It is possible to invert pure function composition (Control.Lens.Iso did that by defining a composition operator over isomorphisms). As it can be observed,
Iso bc cb . Iso ab ba = Iso (bc . ab) (ba . cb) -- from Lenses talk
invert (f . g) = (invert g) . (invert f) -- pseudo-code
In other words, to invert a function composition one should compose inverted functions in the opposite order. So, given all primitive isomorphic pairs are defined, one can compose them to get more complicated pairs with no code duplication. Here is an example of pure bidirectional computation (Control.Lens is used, the explanatory video can help you to get the general idea of Lenses, Folds and Traversals):
import Control.Lens
tick :: Num a => Iso' a a
tick = iso (+1) (subtract 1) -- define an isomorphic pair
double :: Num a => Iso' a a
double = iso (+2) (subtract 2) -- and another one
threeTick :: Num a => Iso' a a
-- These are composed via simple function composition!
threeTick = double . tick
main :: IO ()
main = do
print $ (4 :: Int)^.tick -- => 5
print $ (4 :: Int)^.from tick -- => 3
print $ (4 :: Int)^.threeTick -- => 7, Composable
print $ (4 :: Int)^.from threeTick -- => 1, YEAH
As you can see, I didn't need to supply the inverted version of threeTick; it is obtained by backward composition automatically!
Now, let's consider a simple parser.
data FOO = FOO Int Int deriving Show
parseFoo :: Parser FOO
parseFoo = FOO <$> decimal <* char ' '
<*> decimal
parseFoo' :: Parser FOO
parseFoo' = do
first <- decimal
void $ char ' '
second <- decimal
return $ FOO first second
printFoo :: FOO -> BS.ByteString
printFoo (FOO a b) = BS.pack(show a) <>
BS.pack(" ") <>
BS.pack(show b)
main :: IO ()
main = do
print $ parseOnly parseFoo "10 11" -- => Right (FOO 10 11)
print $ parseOnly parseFoo' "10 11" -- => Right (FOO 10 11)
print . printFoo $ FOO 10 11 -- => "10 11"
print . parseOnly parseFoo . printFoo $ FOO 10 11 -- id
You can see that both versions of parseFoo are fairly declarative (thanks to parser combinators). Note the similarity between parseFoo and printFoo. Can I define isomorphisms over primitive parsers (decimal and char) and then just derive the printer (printFoo :: FOO -> String) automatically? Ideally, parser combinators will work as well.
I tried to redefine a monadic >>= operator in order to provide inverted semantics, but I've failed to do so. I feel that one could define inverted Kleisli composition operator (monadic function composition) similarly to composition inversion, but can one use it with ordinary monads?
f :: a -> m b, inverse f :: b -> m a
g :: b -> m c, inverse g :: c -> m b
inverse (f >=> g) = (inverse f) <=< (inverse g)
Why inverse f is of type b -> m a and not m b -> a? The answer is: monadic side effect is an attribute of an arrow, not that of a data type b. The state monad dualization is further discussed in the great Expert to Expert video.
If the solution does exist, could you please supply a working example of printFoo derivation? By the way, here is an interesting paper that could help us find the solution.

You may be interested in digging in further into the lens package for the concept of a Prism.
A Prism can be used as a 'smart constructor' to build something (e.g. a pretty printed string) unconditionally, and match on it (e.g. parse).
You'd have to ignore the laws or treat the laws as holding only up to a quotient though, as the string you get out of pretty printing is very likely not exactly the string you parsed.

Related

Parse error when reading

GHC can't derive Read or Show for complicated GADTs, so I attempted to define custom instances below that satisfy read . show == id. I've simplified the example as much as possible (while still triggering the runtime error like my real code). I decided to let GHC do the heavy lifting of writing Read and Show instances by making newtype wrappers for each GADT constructor (more accurately: for each type output by the GADT). In the Read/Show instances, I simply read/show the newtype wrapper and convert where necessary. I assumed this was foolproof: I'm letting GHC define all of the non-trivial instances. However, I seem to have done something wrong.
In my real code, Foo below is a complicated GADT that GHC won't derive Show or Read for. Since Foo is a wrapper (in part) around a newtype, I use the derived Show/Read instances for that.
{-# LANGUAGE FlexibleContexts, FlexibleInstances, GADTs, ScopedTypeVariables #-}
import Text.Read (Read(readPrec))
newtype Bar r = Bar r deriving (Show, Read)
newtype Foo r = Foo (Bar r)
-- use the GHC-derived Show/Read for Bar
instance (Show r) => Show (Foo r) where
show (Foo x) = show x
instance (Read r) => Read (Foo r) where
readPrec = Foo <$> readPrec
This instance seems to work: I can call read . show and I get back the input.
Now I have a wrapper around Foo:
data U t rep r where
U1 :: t r -> U t Int r
U2 :: t r -> U t Char r
-- use the Read/Show instances for U1Wrap and U2Wrap
newtype U1Wrap t r = U1Wrap {unU1Wrap :: t r} deriving (Show, Read)
newtype U2Wrap t r = U2Wrap (t r) deriving (Show, Read)
instance (Read (t r)) => Read (U t Int r) where
readPrec = (U1 . unU1Wrap) <$> readPrec
instance (Read (U2Wrap t r)) => Read (U t Char r) where
readPrec = do
x <- readPrec
return $ case x of
(U2Wrap y) -> U2 y
instance (Show (t r)) => Show (U t Int r) where
show (U1 x) = show $ U1Wrap x
instance (Show (t r)) => Show (U t Char r) where
show (U2 x) = show (U2Wrap x :: U2Wrap t r)
Like Foo, U is a complicated GADT, so I define custom newtype wrappers for each output type of the GADT. Unfortunately, my Show/Read instances don't work:
main :: IO ()
main = do
let x = U1 $ Foo $ Bar 3
y = U2 $ Foo $ Bar 3
print $ show (read (show x) `asTypeOf` x)
print $ show (read (show y) `asTypeOf` y)
main prints the first line, but I get Exception: Prelude.read: no parse on the second line.
This is my first time using Read, so I suspect I'm doing something incorrectly, though I don't see what that is.
My questions are:
Why am I getting this error, and logically how can I fix it? (There are several ways to poke the minimal example above to make the error go away, but I can't do those things in my real code.)
Is there a different/better high-level approach to Reading GADTs?
Your custom Show instance for Foo doesn't parenthesize correctly.
> U2 $ Foo $ Bar 3
U2Wrap Bar 3
When writing custom Show instances, you should define showsPrec instead. That's because show just gives back a string independently of the context, while showsPrec parenthesizes based on precendence. See Text.Show for further documentation.
instance (Show r) => Show (Foo r) where
showsPrec n (Foo x) = showsPrec n x
Which works here.
I don't know of an elegant approach that would automacially get us a Read instance for this GADT. The deriving mechanism can't seem to figure out that only a single constructor has to be considered for each rep.
At least Show can be derived even here. I also include here a manual Read instance which I hope conforms to Show. I tried to mimic the definitions in Text.Read, which you could also do in other cases. Alternatively, one could use the -ddump-deriv GHC argument to look at derived Read instances, and try to copy them in custom code.
{-# LANGUAGE GADTs, StandaloneDeriving, FlexibleInstances, FlexibleContexts #-}
import Text.Read
import Data.Proxy
data U t rep r where
U1 :: t r -> U t Int r
U2 :: t r -> U t Char r
deriving instance Show (t r) => Show (U t rep r)
instance Read (t r) => Read (U t Int r) where
readPrec = parens $ do
prec 10 $ do
Ident "U1" <- lexP
U1 <$> readPrec
instance Read (t r) => Read (U t Char r) where
readPrec = parens $ do
prec 10 $ do
Ident "U2" <- lexP
U2 <$> readPrec

Implement `Applicative Parser`'s Apply Function

From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.
Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.

Parse string with lex in Haskell

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.
I am implementing parser for custom type:
data Tree a = Leaf a | Branch (Tree a) (Tree a)
printing function for convenience
showsTree :: Show a => Tree a -> String -> String
showsTree (Leaf x) = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)
instance Show a => Show (Tree a) where
showsPrec _ x = showsTree x
this parser is fine but breaks when there are spaces
readsTree :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) = [(Branch l r, u) | (l, '|':t) <- readsTree s,
(r, '>':u) <- readsTree t ]
readsTree s = [(Leaf x, t) | (x,t) <- reads s]
this one is said to be a better solution, but it does not work without spaces
readsTree_lex :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s = [(Branch l r, x) | ("<", t) <- lex s,
(l, u) <- readsTree_lex t,
("|", v) <- lex u,
(r, w) <- readsTree_lex v,
(">", x) <- lex w ]
++
[(Leaf x, t) | (x, t) <- reads s ]
next I pick one of parsers to use with read
instance Read a => Read (Tree a) where
readsPrec _ s = readsTree s
then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:
read "<1|<2|3>>" :: Tree Int -- succeeds with readsTree
read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex
when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.
There are two questions arising:
how do I define parser that really ignores spaces, not requires them?
how can I define rules for splitting encountered literals with lex
speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.
lex splits into Haskell lexemes, skipping whitespace.
This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.
You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.
If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).
The answer to this seems to be a small gap between text of Gentle introduction to Haskell and its code samples, plus an error in sample code.
there should also be one more lexer, but there is no working example (satisfying my need) in codebase, so I written one. Please point out any flaw in it:
lexAll :: ReadS String
lexAll s = case lex s of
[("",_)] -> [] -- nothing to parse.
[(c, r)] -> if length c == 1 then [(c, r)] -- we will try to match
else [(c, r), ([head s], tail s)]-- not only as it was
any_else -> any_else -- parsed but also splitted
author sais:
Finally, the complete reader. This is not sensitive to white space as
were the previous versions. When you derive the Show class for a data
type the reader generated automatically is similar to this in style.
but lexAll should be used instead of lex (which seems to be said error):
readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
(l, u) <- readsTree' t,
("|", v) <- lexAll u,
(r, w) <- readsTree' v,
(">", x) <- lexAll w ]
++
[(Leaf x, t) | (x, t) <- reads s]

Bottleneck in math parser Haskell

I got this code below from the wiki books page here. It parses math expressions, and it works very well for the code I'm working on. Although there is one problem, when I start to add layers of brackets to my expression the program slows down dramatically, crashing my computer at some point. It has something to do with the number of operators I have it check for, the more operators I have the less brackets I can parse. Is there anyway to get around or fix this bottleneck?
Any help is much appreciated.
import Text.ParserCombinators.ReadP
-- slower
operators = [("Equality",'='),("Sum",'+'), ("Product",'*'), ("Division",'/'), ("Power",'^')]
-- faster
-- ~ operators = [("Sum",'+'), ("Product",'*'), ("Power",'^')]
skipWhitespace = do
many (choice (map char [' ','\n']))
return ()
brackets p = do
skipWhitespace
char '('
r <- p
skipWhitespace
char ')'
return r
data Tree op = Apply (Tree op) (Tree op) | Branch op (Tree op) (Tree op) | Leaf String deriving Show
leaf = chainl1 (brackets tree
+++ do
skipWhitespace
s <- many1 (choice (map char "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-[]" ))
return (Leaf s))
(return Apply)
tree = foldr (\(op,name) p ->
let this = p +++ do
a <- p +++ brackets tree
skipWhitespace
char name
b <- this
return (Branch op a b)
in this)
(leaf +++ brackets tree)
operators
readA str = fst $ last $ readP_to_S tree str
main = do loop
loop = do
-- ~ try this
-- ~ (a+b+(c*d))
str <- getLine
print $ last $ readP_to_S tree str
loop
This is a classic problem in backtracking (or parallel parsing, they are basically the same thing).... Backtracking grows (at worst) exponentially with the size of the input, so the time to parse something can suddenly explode. In practice backtracking works OK in language parsing for most input, but explodes with recursive infix operator notation. You can see why by considering how many possibile ways this could be parsed (using made up & and % operators):
a & b % c & d
could be parsed as
a & (b % (c & d))
a & ((b % c) & d)
(a & (b % c)) & d
((a & b) % c) & d
This grows like 2^(n-1). The solution to this is to add some operator precidence information earlier in the parse, and throw away all but the sensible cases.... You will need an extra stack to hold pending operators, but you can always go through infix operator expressions in O(1).
LR parsers like yacc do this for you.... With a parser combinator you need to do it by hand. In parsec, there is a Expr package with a buildExpressionParser function that builds this for you.

Preserving the failure mode of a parser in Parsec

Problem statement
Suppose I have two parsers p and q and I concatenate them like this:
r = try p *> q
In Parsec, the behavior of this is:
If p fails without consuming input, then r fails without consuming input.
If p fails after consuming input, then r fails without consuming input.
if q fails without consuming input, then r fails after consuming p.
if q fails after consuming input, then r fails after consuming p and parts of q.
However, the behavior I'm looking for is a bit unusual:
If p fails without consuming input, then r should fail without consuming input.
If p fails after consuming input, then r should fail without consuming input.
if q fails without consuming input, then r should fail without consuming input.
if q fails after consuming input, then r should fail after consuming some input.
I can't seem to think of a clean way to do this.
Rationale
The reason is that I have a parser like this:
s = (:) <$> q <*> many r
The q parser, embedded inside the r parser, needs a way to signal either: invalid input (which occurs when q consumes input but fails), or end of the many loop (which occurs when q doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p but without any more valid q's to consume, in which case q will fail but without consuming any input.
So I was wondering if anyone have an elegant way to solve this problem? Thanks.
Addendum: Example
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
Test input on (hypothetical) parser s, had it worked the way I wanted:
xyz (accept)
xyzP (accept; P remains unparsed)
xyzPx (accept; Px remains unparsed; q failed but did not consume any input)
xyzPxy (reject; parser q consumed xy but failed)
xyzPxyz (accept)
In the form r = try p *> q, s will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:
r = (++) <$> try (string "P" *> string "xy") <*> string "z"
but this isn't a general solution that works for any parsers p and q. (Perhaps a general solution doesn't exist?)
I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:
{-# LANGUAGE FlexibleContexts #-}
import Control.Applicative hiding (many, (<|>))
import Control.Monad (void)
import Control.Monad.Trans (lift)
import Control.Monad.Trans.Maybe
import Text.Parsec hiding (optional)
import Text.Parsec.Char
import Text.Parsec.String
rcomb :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m b
rcomb p q = ((test $ opt p *> opt q) <|> pure (Just ()))
>>= maybe empty (\_ -> p *> q)
where
-- | Converts failure to #MaybeT Nothing#:
opt = MaybeT . optional -- optional from Control.Applicative!
-- | Tests running a parser, returns Nothing if parsers failed consuming no
-- input, Just () otherwise.
test = lookAhead . try . runMaybeT . void
This is the r combinator you're asking for. The idea is that we first execute the parsers in a "test" run (using lookAhead . try) and if any of them fails without consuming input, we record it as Nothing inside MaybeT. This is accomplished by opt, it converts a failure to Nothing and wraps it into MaybeT. Thanks to MaybeT, if opt p returns Nothing, opt q is skipped.
If both p and q succeed, the test .. part returns Just (). And if one of them consumes input, the whole test .. fails. This way, we distinguish the 3 possibilities:
Failure with some input consumed by p or q.
Failure such that the failing part doesn't consume input.
Success.
After <|> pure (Just ()) both 1. and 3. result in Just (), while 2. results in Nothing. Finally, the maybe part converts Nothing into a non-consuming failure, and Just () into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.
Testing:
samples =
[ "xyz" -- (accept)
, "xyzP" -- (accept; P remains unparsed)
, "xyzPz" -- (accept; Pz remains unparsed)
, "xyzPx" -- (accept; Px remains unparsed; q failed but did not consume any input)
, "xyzPxy" -- (reject; parser q consumed xy but failed)
, "xyzPxyz" -- (accept)
]
main = do
-- Runs a parser and then accept anything, which shows what's left in the
-- input buffer:
let run p x = runP ((,) <$> p <*> many anyChar) () x x
let p, q :: Parser String
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
let parser = show <$> ((:) <$> q <*> many (rcomb p q))
mapM_ (print . run parser) samples

Resources