Parsing a Complex Number from Scratch - parsing

Given a data type data CI = CI Int Int, representing a complex number, I want to build a parser for CI that can convert "a" to CI a 0 and "(a,b)" to CI a b. For example, I want a function parseCI such runParser parseCI "(1,2)" returns the value [(CI 1 2, "")] (ideally, but something similar is fine). I also want to make CI an instance of read.
I would like to do this using functions and definitions from the code below (basically, without anything advanced, like Parsec), but I'm not sure where to start. Some starting code to set me on the right track and/or a hint would be helpful. I'm not looking for a full answer, as I'd like to figure that out myself.
module Parser where
import Control.Applicative
import Control.Monad
newtype Parser a = Parser { runParser :: String -> [(a,String)] }
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = Parser $ \s -> case s of
[] -> []
a:as -> [(a,as) | f a]
char :: Char -> Parser Char
char = satisfy . (==)
string :: String -> Parser String
string str = Parser $ \s -> [(t,u) | let (t,u) = splitAt (length str) s, str == t]
instance Functor Parser where
fmap f p = Parser $ \s ->
[ (f a,t)
| (a,t) <- runParser p s
]
instance Applicative Parser where
pure a = Parser $ \s -> [(a,s)]
af <*> aa = Parser $ \s ->
[ (f a,u)
| (f,t) <- runParser af s
, (a,u) <- runParser aa t
]
instance Alternative Parser where
empty = Parser $ \s -> []
p1 <|> p2 = Parser $ (++) <$> runParser p1 <*> runParser p2`
instance Monad Parser where
return = pure
ma >>= f = Parser $ \s ->
[ (b,u)
| (a,t) <- runParser ma s
, (b,u) <- runParser (f a) t
]
instance MonadPlus Parser where
mzero = empty
mplus = (<|>)

You've probably already seen it, but in case you haven't: Monadic Parsing in Haskell sets up parsing like this.
Since you have two different ways of parsing CI, you might want to approach this as two problems: make one parser parseCI1 that parses "a" to CI a 0 and make another parser parseCI2 that parses "(a,b)" to CI a b. Then, you can combine these into one with
parseCI = parseCI1 <|> parseCI2
For both of these subparsers, you will need some way of parsing integers: parseInt :: Parser Int. When making parseInt, you will likely want to use some combination of satisfy, isDigit, read, and possibly some (depending on how you go about solving this).
Making CI an instance of read is a bit more straightforward once you have parseCI done:
instance Read CI where
readsPrec _ = runParser parseCI

Related

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Simple Parse in Haskell with do construct

I'm trying to write a simple Parser, so all the declarations are listed in the image below, but when I try to compile this module it fails.
I'm following the tutorial provided by this source ->
Haskell lessons suggested by official site and specifically this video by Dr. Erik Meijer (Lesson on Parser with "do" construct).
The problem is that I thought that the "do" construct was able to "concatenate" outputs from a previous function in a descending way, but the way that this p function should work seems to be magic to me. What's the right implementation?
-- Generic Parser.
type Parser a = String -> [(a, String)]
-- A simple Parser that captures the first char of the string, puts it in
-- the first position of the couple and then puts the rest of the string into
-- the second place of the couple inside the singleton list.
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x, xs)]
-- Simple parser that always fails.
failure :: Parser a
failure = \inp -> []
-- Returns the type of the parser without operating on the input.
return1 :: a -> Parser a
return1 v = \inp -> [(v, inp)]
-- Explicit call to parse.
parse :: Parser a -> String -> [(a, String)]
parse p inp = p inp
-- Some kind of "or" operator that if the first parser fails (returning an empty list) it
-- parses the second parser.
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case p inp of
[] -> parse q inp
[(v, out)] -> [(v, out)]
-- The function within I'm having troubles.
p :: Parser (Char,Char)
p = do
x <- item
item
y <- item
return1 (x, y)
This is how it's explained by Dr. Meijer:
And this is how it should work:
Your Parser is just a type synonym for a function. The friendly parsers you've seen in use are all proper types of their own, with Functor and Applicative instances, along with (in most cases) Alternative, Monad, and MonadPlus instances. You probably want something that looks like the following (untested, never compiled) version.
import Control.Monad (ap, liftM)
import Control.Applicative (Alternative (..))
newtype Parser a = Parser
{ runParser :: String -> [(a, String)] }
instance Functor Parser where
fmap = liftM
instance Applicative Parser where
pure v = Parser $ \inp -> [(v, inp)]
(<*>) = ap
instance Monad Parser where
-- The next line isn't required for
-- recent GHC versions
-- return = pure
Parser m >>= f = Parser $ \s ->
[(r, s'') | (x, s') <- m s
, (r, s'') <- runParser (f r) s']
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
[(v, out)] -> [(v, out)]
failure :: Parser a
failure = Parser $ \inp -> []
instance Alternative Parser where
(<|>) = (+++)
empty = failure
instance MonadPlus Parser

Implementing (<++) within a Haskell Parser

The goal of this is to implement <++ in Haskell using StateT to act as a parser
import Control.Monad.State
type Parser = StateT String []
runParser :: Parser a -> String -> [(a,String)]
runParser = runStateT
parser :: (String -> [(a,String)]) -> Parser a
parser = StateT
(<++) :: Parser a -> Parser a -> Parser a
From this base, how could I implement <++ so that it mirrors the definition in Text.ParserCombinators.ReadP
Local, exclusive, left-biased choice: If left parser locally produces any result at all, then right parser is not used.
Your parser are functions String -> [(a,String)] where the argument is the initial state. So lets construct such a function that tries the first parser, and if it fails tries the second parser:
(<++) :: Parser a -> Parser a -> Parser a
p1 <++ p2 =
parser $ \s0 -> -- 1. captures initial state
case runParser p1 s0 of -- 2. run first parser
(x:xs) -> x : xs -- 3. success
[] -> runParser p2 s0 -- 4. otherwise run second parser
Some parsers to try this out
-- | Always fails
failP :: Parser a
failP = parser (const [])
-- | Parses an Int
intP :: Parser Int
intP = parser reads
Output as expected? (Perhaps there are more enlightening test cases...)
λ> runParser (failP <++ intP) "2014"
[(2014,"")]
λ> runParser (intP <++ failP) "2014"
[(2014,"")]
λ> runParser (intP <++ intP) "2014"
[(2014,"")]

Minimal Purely Applicative Parser

I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).

Parsec parsing and separating different structures

Let's say I have different parsers p1, ..., pk. I want to define a function pk :: Parser ([t1], ..., [tk]) where pi :: Parser ti.
That will parse a collection of strings (one after the other) that match any of p1...pk and separate them in the corresponding lists. Assume for simplicity that none of the strings matches two parsers.
I managed to do it, but I am really struggling to find an elegant way (preferably using many or any other builtin parsec parser).
The first step is to turn each of your parsers into parser of the big type:
p1 :: Parser t1
p2 :: Parser t2
p3 :: Parser t3
p1 = undefined
p2 = undefined
p3 = undefined
p1', p2', p3' :: Parser ([t1], [t2], [t3])
p1' = fmap (\x -> ([x], [], [])) p1
p2' = fmap (\x -> ([], [x], [])) p2
p3' = fmap (\x -> ([], [], [x])) p3
Now, we repeatedly choose from these last parsers, and concatenate the results at the end:
parser :: Parser ([t1], [t2], [t3])
parser = fmap mconcat . many . choice $ [p1', p2', p3']
There are Monoid instances for tuples up to size five; beyond that, you can either use nested tuples or a more appropriate data structure.
Representing the parsers as a list makes this easy. Using:
choice :: [Parser a] -> Parser a
many :: Parser a -> Parser [a]
We can write:
combineParsers :: [Parser a] -> Parser [a]
combineParsers = many . choice
This isn't quite right, as it bundles them all into a single list. Let's associate each parser with a unique identifier:
combineParsers' :: [(k, Parser a)] -> Parser [(k, a)]
combineParsers' = many . choice . combine
where combine = map (\(k,p) -> (,) k <$> p)
We can then convert this back to the list form:
combineParsers :: [Parser a] -> Parser [[a]]
combineParsers ps = map snd . sortBy fst <$> combineParsers' (zip [0..] ps)
You could perhaps make this more efficient by writing combineParsers' :: [(k, Parser a)] -> Parser (Map k [a]) instead, using e.g. combine = map $ \(k,p) -> fmap (\a -> Map.insertWith (++) k [a]) p.
This requires that all the parsers have the same result type, so you'll need to wrap each of their results in a data-type with Cons <$> p for each p. You can then unwrap the constructor from each list. Admittedly, this is fairly ugly, but to do it heterogeneously with tuples would require even uglier type-class hacks.
There might be a simpler solution for your specific use-case, but this is the easiest way to do it generically that I can think of.
You can't do this completely generically without type-level trickery, unfortunately. However, an approach along the lines of Daniel Wagner's suggestion can be constructed in DIY style, using polymorphic combinators and some pre-existing recursive instances. I'll present a simple example to demonstrate.
First, a few simpleminded parsers to test with:
type Parser = Parsec String ()
parseNumber :: Parser Int
parseNumber = read <$> many1 digit
parseBool :: Parser Bool
parseBool = string "yes" *> return True
<|> string "no" *> return False
parseName :: Parser String
parseName = many1 letter
Next we create a "base case" type to mark the end of the possibilities, and give it a parser that always succeeds on no input.
data Nil = Nil deriving (Eq, Ord, Read, Show, Enum, Bounded)
instance Monoid Nil where
mempty = Nil
mappend _ _ = Nil
parseNil :: Parser Nil
parseNil = return Nil
This is equivalent to (), of course--I'm only creating a new type to disambiguate in case we actually want to parse (). Next, the combinator that chains parsers together:
infixr 3 ?|>
(?|>) :: (Monoid b) => Parser a -> Parser b -> Parser ([a], b)
p1 ?|> p2 = ((,) <$> ((:[]) <$> p1) <*> pure mempty)
<|> ((,) <$> pure [] <*> p2)
What's going on here is that p1 ?|> p2 tries p1, and if it succeeds wraps that in a singleton list and fills in mempty for the second element of the tuple. If p1 fails, if fills in an empty list and uses p2 for the second element.
parseOne :: Parser ([Int], ([Bool], ([String], Nil)))
parseOne = parseNumber ?|> parseBool ?|> parseName ?|> parseNil
Combining parsers with the new combinator is simple, and the result type is pretty self-explanatory.
parseMulti :: Parser ([Int], ([Bool], ([String], Nil)))
parseMulti = mconcat <$> many1 (parseOne <* newline)
We're relying on the recursive Monoid instance for tuples and the usual instance for lists to combine multiple lines. Now, to show that it works, define a quick test:
runTest = parseTest parseMulti testInput
testInput = unlines [ "yes", "123", "acdf", "8", "no", "qq" ]
Which parses successfully as Right ([123,8],([True,False],(["acdf","qq"],Nil))).

Resources