In MegaParsec is there a way of anding 2 parsers? - parsing

If I have a parser like:
notZeroOrOne :: Parser Char
notZeroOrOne = noneOf ['0' , '1']
Is there a way I can combine the rules of another parser like digitChar so I can get a parser that will only pass if both parsers would pass?
Something like
biggerThanOne :: Parser Char
biggerThanOne = digitChar && notZeroOrOne

As user2407038 suggests in the comments, it's achievable using the lookAhead function.
biggerThanOne :: Parser Char
biggerThanOne =
lookAhead digitChar *> notZeroOrOne
However, the parsers are sequential by nature, so it's both more efficient and comprehendable to apply the sequential logic. E.g., using the Monad instance of Parser:
biggerThanOne :: Parser Char
biggerThanOne =
do
c <- digitChar
if c /= '0' && c /= '1'
then return c
else unexpected "Not bigger than one"
Or MonadPlus:
biggerThanOne :: Parser Char
biggerThanOne =
mfilter (\c -> c /= '0' && c /= '1') digitChar
which can be refactored to use the Ord instance of Char and pretty much clearly express your intent:
biggerThanOne :: Parser Char
biggerThanOne =
mfilter (> '1') digitChar

In the uu-parsinglib there is an amb combinator, which gives you all possible parses. You can use this to see whether you get both parses. If you get only a single parse you can use a monad to fail
p pAnd q = amb (p <|> q) >>= \ r -> if length r == 2 then return ... else pFail

Related

Using chainl1 correctly with infix in Haskell

For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))
There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.
The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]

Haskell monadic parser with anamorphisms

My problem is how to combine the recursive, F-algebra-style recursive type definitions, with monadic/applicative-style parsers, in way that would scale to a realistic programming language.
I have just started with the Expr definition below:
data ExprF a = Plus a a |
Val Integer deriving (Functor,Show)
data Rec f = In (f (Rec f))
type Expr = Rec ExprF
and I am trying to combine it with a parser which uses anamorphisms:
ana :: Functor f => (a -> f a) -> a -> Rec f
ana psi x = In $ fmap (ana psi) (psi x)
parser = ana psi
where psi :: String -> ExprF String
psi = ???
as far as I could understand, in my example, psi should either parse just an integer, or it should decide that the string is a <expr> + <expr> and then (by recursively calling fmap (ana psi)), it should parse the left-hand side and the right-hand side expressions.
However, (monadic/applicative) parsers don't work like that:
they first attempt parsing the left-hand expression,
the +,
and the right-hand expression
One solution that I see, is to change the type definition for Plus a a to Plus Integer a, such that it reflects the parsing process, however this doesn't seem like the best avenue.
Any suggestions (or reading directions) would be welcome!
If you need a monadic parser, you need a monad in your unfold:
anaM :: (Traversable f, Monad m) => (a -> m (f a)) -> a -> m (Rec f)
anaM psiM x = In <$> (psiM x >>= traverse (anaM psiM))
Then you can write something that parses just one level of an ExprF like this:
parseNum :: Parser Integer
parseNum = -- ...
char :: Char -> Parser Char
char c = -- ...
parseExprF :: Maybe Integer -> Parser (ExprF (Maybe Integer))
parseExprF (Just n) = pure (Val n)
parseExprF Nothing = do
n <- parseNum
empty
<|> (Plus (Just n) Nothing <$ char '+')
<|> (pure (Val n))
Given that, you now have your recursive Expr parser:
parseExpr :: Parser Expr
parseExpr = anaM parseExprF Nothing
You will need to have instances of Foldable and Traversable for ExprF, of course, but the compiler can write these for you and they are not themselves recursive.

Purely Applicative Parser using Alternative

In a previous post, a user offered an implementation of a purely applicative parser for Haskell (code originally from here). Below is the partial implementation of that parser:
{-# LANGUAGE Rank2Types #-}
import Control.Applicative (Alternative(..))
import Data.Foldable (asum, traverse_)
The type:
newtype Parser a = Parser {run :: forall f. Alternative f => (Char -> f ()) -> f a}
The instances:
instance Functor Parser where
fmap f (Parser cont) = Parser $ \char -> f <$> cont char
instance Applicative Parser where
pure a = Parser $ \char -> pure a
(Parser contf) <*> (Parser cont) = Parser $ \char -> (contf char) <*> (cont char)
instance Alternative Parser where
empty = Parser $ \char -> empty
(Parser cont) <|> (Parser cont') = Parser $ \char -> (cont char) <|> (cont' char)
some (Parser cont) = Parser $ \char -> some $ cont char
many (Parser cont) = Parser $ \char -> many $ cont char
Some example parsers:
item = Parser $ \char -> asum $ map (\c -> c <$ char c) ['A'..'z']
digit = Parser $ \char -> asum $ map (\c -> c <$ char (head $ show c)) [0..9]
string s = Parser $ \char -> traverse_ char s
Unfortunately, I'm having a hard time trying to understand how I might use this parser implementation. In particular, I do not understand what Char -> f () should/could be and how I could use this to do simple parsing, e.g. to extra a digit out of an input string. I'd like a concrete example if possible. Could someone please shed some light?
In forall f. Alternative f => (Char -> f ()) -> f a, the Char -> f () is something that you are provided with. Your mission, should you choose to accept it, is to then turn that into an f a using only these two bits:
The Char -> f () function (i.e. a way to parse a single character: if the next character matches the argument, the parsing succeeds; otherwise it doesn't.)
The Alternative instance of f
So how would you parse a single digit into an Int? It would have to be of the form
digit :: Parser Int
digit = Parser $ \parseChar -> _
In _, we have to create an f Int using the kit parseChar :: Char -> f () and Alternative f. We know how to parse a single '0' character: parseChar '0' succeds iff the next character is '0'. We can turn it into a value of Int via f's Functor instance, arriving at
digit0 :: Parser Int
digit0 = Parser $ \parseChar -> fmap (const 0) (parseChar '0')
But f is not just Functor, it is also Alternative, so we can write digit in long-form as
digit :: Parser Int
digit = Parser $ \parseChar -> fmap (const 0) (parseChar '0') <|>
fmap (const 1) (parseChar '1') <|>
fmap (const 2) (parseChar '2') <|>
fmap (const 3) (parseChar '3') <|>
fmap (const 4) (parseChar '4') <|>
fmap (const 5) (parseChar '5') <|>
fmap (const 6) (parseChar '6') <|>
fmap (const 7) (parseChar '7') <|>
fmap (const 8) (parseChar '8') <|>
fmap (const 9) (parseChar '9')
And from here, it is merely a matter of pedestrian Haskell programming to cut down on the cruft, arriving at something like
digit :: Parser Int
digit = Parser $ \parseChar -> asum [fmap (const d) (parseChar c) | d <- [0..9], let [c] = show d]
which we can further simplify by noting that fmap (const x) f can be written as x <$ f, giving
digit :: Parser Int
digit = Parser $ \parseChar -> asum [d <$ parseChar c | d <- [0..9], let [c] = show d]
The Char -> f () part represents matching on a single character. Namely, if you do char 'c', it will match on 'c' and fail on everything else.
To use it, you can convert it to, say Parsec:
convert :: Parser a -> Parsec a
convert p = run p anyChar
p is essentially of the type forall f. Alternative f => (Char -> f ()) -> f a, which specializes to (Char -> Parsec ()) -> Parsec a. We pass in anyChar, and it will produce a Parsec a value by using anyChar and any Alternative operations.
Basically, a Parser a it is a function that, given away to match on a single character, and an Alternative instance, it will produce an Alternative value.

How can I parse a float with a comma in place of the decimal point?

I want to parse Float values from a file where they are stored using comma as the decimal separator. Thus i need a function myParse :: String -> Float such that, for instance, myParse "23,46" == 23.46.
I have some ideas about how to do this, but they all seem overcomplicated, for example:
replace , with a . in the string and use read; or
follow this FP Complete blogpost (entitled Parsing Floats With Parsec), and challenge the curse of the monomorphism restriction.
Is there a simpler way, or do I really need to use a parsing library? In the second case, could you please paste some suggestions in order to get me started? The monomorphism restriction scares me, and I believe that there has to be a way to do this without using language extensions.
Replacing , by . and then call read is straightforward enough; you just need to remember to use your own specialized function instead of plain old read:
readFloatWithComma :: String -> Float
readFloatWithComma = read . sanitize
where
sanitize = map (\c -> if c == ',' then '.' else c)
In GHCi:
λ> readFloatWithComma "23,46"
23.46
Regarding the parsec approach, despite what the article you link to suggest, the monomorphism restriction needs not be a worry, as long as you have type signatures for all your top-level bindings. In particular, the following code doesn't need any language extensions to compile properly (at least, in GHC 7.10.1):
import Text.Parsec
import Text.Parsec.String ( Parser )
import Control.Applicative hiding ( (<|>) )
infixr 5 <++>
(<++>) :: Applicative f => f [a] -> f [a] -> f [a]
a <++> b = (++) <$> a <*> b
infixr 5 <:>
(<:>) :: Applicative f => f a -> f [a] -> f [a]
a <:> b = (:) <$> a <*> b
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
integer :: Parser String
integer = plus <|> minus <|> number
float :: Parser Float
float = fmap rd $ integer <++> decimal <++> exponent
where rd = read :: String -> Float
decimal = option "" $ ('.' <$ char ',') <:> number
exponent = option "" $ oneOf "eE" <:> integer
In GHCi:
λ> parseTest float "23,46"
23.46

Using Parsec to parse regular expressions

I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.
Any thoughts?
You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.
Here's my attempt (I threw in |, +, and ? for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Maybe a more experienced Haskeller will come along with a better solution.

Resources