How do I generalize a Parser which extracts values from tokens?

How do I generalize a Parser which extracts values from tokens? - parsing

I'm creating a parser in Haskell for a simple language. The parser takes a list of tokens, generated by a separate tokenizer, and turns it into a structured tree. While I was writing this Parser, I came across a reoccurring pattern that I would like to generalize. Here are the relevant definitions:
--Parser Definition
data Parser a b = Parser { runParser :: [a] -> Maybe ([a], b) }
--Token Definition
data Token = TokOp { getTokOp :: Operator }
| TokInt { getTokInt :: Int }
| TokIdent { getTokIdent :: String }
| TokAssign
| TokLParen
| TokRParen
deriving (Eq, Show)
The parser also has instances defined for MonadPlus and all of its super classes. Here are two examples of the reoccurring Pattern I am trying to generalize:
-- Identifier Parser
identP :: Parser Token String
identP = Parser $ \input -> case input of
TokIdent s : rest -> Just (rest, s)
_ -> Nothing
--Integer Parser
intP :: Parser Token Int
intP = Parser $ \input -> case input of
TokInt n : rest -> Just (rest, n)
_ -> Nothing
As you can see, these two examples are incredibly similar, yet I see no way generalize it. Ideally I would want some function of type extractToken :: ?? -> Parser Token a where a is the value contained by the token. I am guessing the solution involves >>=, but I am not experienced enough with Monads to figure it out. This also may be impossible, I am not sure.

It seems difficult to avoid at least some boilerplate. One simple approach is to manually define the field selectors to return Maybes:
{-# LANGUAGE LambdaCase #-}
data Token = TokOp Operator
| TokInt Int
| TokIdent String
| TokAssign
| TokLParen
| TokRParen
deriving (Eq, Show)
getTokOp = \case { TokOp x -> Just x ; _ -> Nothing }
getTokInt = \case { TokInt x -> Just x ; _ -> Nothing }
getTokIdent = \case { TokIdent x -> Just x ; _ -> Nothing }
and then the rest is:
fieldP :: (Token -> Maybe a) -> Parser Token a
fieldP sel = Parser $ \case tok:rest -> (,) rest <$> sel tok
[] -> Nothing
opP = fieldP getTokOp
identP = fieldP getTokIdent
intP = fieldP getTokInt
You could derive the getXXX selectors using Template Haskell or generics, though it's likely not worth it.

Related

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)

The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Parser with multiple return types

I have problem with parsers that can manage multiple types, e.g., the parser for arithmetic expressions. At the moment my parser takes in input only integers:
aexpr :: Parser Int
aexpr = aterm `chainl` addOp
aterm :: Parser Int
aterm = afactor `chainl` mulOp
afactor :: Parser Int
afactor = parenP '(' aexpr ')'
<|>
do
s <- sign
a <- aexpr
return (s * a)
<|>
token int_const
This works well for two integers, but I am introducing the float type, so the parser for arithmetic expressions can have in input two Int, two Float or two mixed types of Int and Float. The result type of the parser should be Int or Float based on the inputs (i.e., Int if both inputs are Int, Float otherwise).
My question is: what is the best way to manage this? A simple way is to just manage all the inputs as Float, so integer values are converted into float values. But I don't like this solution: the end result will be always Float.
Should I create two different parsers, i.e., one for Int and one for Float? But the two parsers would be pretty much the same.
There is a better solution?

When designing DSLs, it's very common to define a new data type representing all the different shapes values can have:
data Value
= I Int
| D Double
deriving (Eq, Ord, Read, Show)
You may want to implement some helper functions, like:
binOp :: (forall a. Num a => a -> a -> a) -> Value -> Value -> Parser Value
binOp f (I a) (I b) = pure (I (f a b))
binOp f (D a) (D b) = pure (D (f a b))
binOp _ _ _ = fail "oof, tried to mix Ints and Doubles"
Then your mulOp and addOp implementations could call binOp (*) and binOp (+) somewhere in their implementation, for example.
But I would consider another approach than using such helpers directly. I would propose introducing an intermediate representation for which parsing "always succeeds", then add a separate type-checking phase where you can throw errors about mixing ints and doubles... or do appropriate casts, or whatever your DSL wants to happen there. So:
data BinOp = Plus | Times deriving (Bounded, Enum, Eq, Ord, Read, Show)
data Type = Int | Double deriving (Bounded, Enum, Eq, Ord, Read, Show)
data DSL
= ILit Int
| DLit Double
| BinOp BinOp DSL DSL
deriving (Eq, Ord, Read, Show)
Then you can write things with types like
typeCheck :: DSL -> These [TypeError] Type -- check for mixing
unsafeEval :: DSL -> Value -- implementation uses incomplete patterns and assumes no mixing
eval :: DSL -> Either [TypeError] Value
eval t = case typeCheck t of
That _ -> Right (unsafeEval t)
These [] _ -> Right (unsafeEval t)
These es _ -> Left es
This es -> Left es
or whatever.

Parsing a Complex Number from Scratch

Given a data type data CI = CI Int Int, representing a complex number, I want to build a parser for CI that can convert "a" to CI a 0 and "(a,b)" to CI a b. For example, I want a function parseCI such runParser parseCI "(1,2)" returns the value [(CI 1 2, "")] (ideally, but something similar is fine). I also want to make CI an instance of read.
I would like to do this using functions and definitions from the code below (basically, without anything advanced, like Parsec), but I'm not sure where to start. Some starting code to set me on the right track and/or a hint would be helpful. I'm not looking for a full answer, as I'd like to figure that out myself.
module Parser where
import Control.Applicative
import Control.Monad
newtype Parser a = Parser { runParser :: String -> [(a,String)] }
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = Parser $ \s -> case s of
[] -> []
a:as -> [(a,as) | f a]
char :: Char -> Parser Char
char = satisfy . (==)
string :: String -> Parser String
string str = Parser $ \s -> [(t,u) | let (t,u) = splitAt (length str) s, str == t]
instance Functor Parser where
fmap f p = Parser $ \s ->
[ (f a,t)
| (a,t) <- runParser p s
]
instance Applicative Parser where
pure a = Parser $ \s -> [(a,s)]
af <*> aa = Parser $ \s ->
[ (f a,u)
| (f,t) <- runParser af s
, (a,u) <- runParser aa t
]
instance Alternative Parser where
empty = Parser $ \s -> []
p1 <|> p2 = Parser $ (++) <$> runParser p1 <*> runParser p2`
instance Monad Parser where
return = pure
ma >>= f = Parser $ \s ->
[ (b,u)
| (a,t) <- runParser ma s
, (b,u) <- runParser (f a) t
]
instance MonadPlus Parser where
mzero = empty
mplus = (<|>)

You've probably already seen it, but in case you haven't: Monadic Parsing in Haskell sets up parsing like this.
Since you have two different ways of parsing CI, you might want to approach this as two problems: make one parser parseCI1 that parses "a" to CI a 0 and make another parser parseCI2 that parses "(a,b)" to CI a b. Then, you can combine these into one with
parseCI = parseCI1 <|> parseCI2
For both of these subparsers, you will need some way of parsing integers: parseInt :: Parser Int. When making parseInt, you will likely want to use some combination of satisfy, isDigit, read, and possibly some (depending on how you go about solving this).
Making CI an instance of read is a bit more straightforward once you have parseCI done:
instance Read CI where
readsPrec _ = runParser parseCI

Minimal Purely Applicative Parser

I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?

First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.

Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).

Haskell Parser not working

I am trying to understand Parsers. Therefore I have created my own parser. Unfortunately it does not work. Why?
type Parser a = String -> [(a, String)]
preturn :: a -> Parser a
preturn t = \inp -> [(t,inp)]
pfailure :: Parser a
pfailure = \inp -> []
pitem :: Parser Char
pitem = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
parse :: Parser a -> Parser a
--parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
{-
combine :: Parser a -> Parser b -> Parser (a,b)
combine p1 p2 = \inp -> p2 t output
where
p1 inp = ([
-}
-- firstlast :: Parser (Char,Char)
firstlast = do
x <- pitem
z <- pitem
y <- pitem
preturn (x,y)
another = do
x <- pitem
y <- pitem
Firstlast is supposed to take a string and return the first and third character. Unfortunately, it returns odd values, and it does not accept its type (Parser (Char,Char))
For example,
*Main> firstlast "abc"
[(([('a',"bc")],[('a',"bc")]),"abc")]
What should happen is:
*Main> firstlast "abc"
[("ac",[])]

Please use code that compiles. Your another function does not.
What's the problem?
Your code for firstlast and another makes use of do-notation. And the way you're using pitem here, it looks as if you're expecting Parser to be a monad. But it isn't, at least not in the way you expect it to be.
There is a monad instance pre-defined which make GHC think that Parser is a monad, namely
instance Monad ((->) r) where
return = const
f >>= k = \ r -> k (f r) r
What this instance says is that, for any type r the function type r -> ... can be considered a monad, namely by distributing the parameter everywhere. So returning something in this monad amounts to producing a value ignoring the parameter of type r, and binding a value means that you take r and pass it on both to the left and right computation.
This is not what you want for a parser. The input string will be distributed to all computations. So each pitem will operate on the original input string. Furthermore, as
pitem :: String -> [(Char, String)]
the result of your monadic computation will be of type [(Char, String)], so x and y are both of this type. That's why you get the result
[(([('a',"bc")],[('a',"bc")]),"abc")]
You're calling pitem three times on the same input string. You're putting two results in a pair, and you're preturn-ing the whole thing.
How to fix it?
You need to define your own monad instance for the Parser type. You cannot do that directly, because Parser is a type synonym, and type synonyms cannot be partially applied,
so you cannot write
instance Monad Parser where
...
Instead, you have to wrap Parser in a new datatype or newtype:
newtype Parser a = Parser { parse :: String -> [(a, String)] }
This gives you a constructor Parser and a function parse to convert between the unwrapped and wrapped parser types:
Parser :: String -> [(a, String)] -> Parser a
parse :: Parser a -> String -> [(a, String)]
This implies you'll have to adapt your other functions. For example, preturn becomes
preturn :: a -> Parser a
preturn t = Parser (\inp -> [(t,inp)])
Change pfailure and pitem similarly. Then, you have to define the Monad instance:
instance Monad Parser where
return = preturn
(>>=) = ... -- to be completed by you
The function (>>=) is not contained in your code above. You'll want to implement the behaviour that the input is passed to the first parser, and for every result of that, the result and the remaining input are passed to the second argument of (>>=). Once this is done, a call to parse firstlast "abc" will have the following result:
[(('a','c'),"")]
which isn't quite what you want in your question, but I believe it's what you're actually after.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I generalize a Parser which extracts values from tokens? - parsing

Related

Haskell : Operator Parser keeps going to undefined rather than inputs

Parser with multiple return types

Parsing a Complex Number from Scratch

Minimal Purely Applicative Parser

Haskell Parser not working

Categories

Resources