How do I parse S-expressions into a data structure in Haskell?

How do I parse S-expressions into a data structure in Haskell? - parsing

I'm new to Haskell and could use some guidance.
The challenge: take an S-expression and parse it into a record.
Where I have succeeded: I can take a file and read it into a parsed String.
Yet, using parsing Text to DFA s.t
let
toDFA :: [L.Text] -> EntryDFA
toDFA t =
let [q,a,d,s,f] = t
in EntryDFA {
state = read q
,alpha = read a
,delta = read d
,start = read s
,final = read f }
returns this error:
• Couldn't match type ‘L.Text’ with ‘[Char]’
Expected type: String
Actual type: L.Text
There must be a more idiomatic approach.

read is a partial function with type Read a => String -> a, which throws an exception on parsing failure. Normally you want to avoid it (use readMaybe instead if you have a string). String and L.Text are different types, which is why you're getting an error.
Your sample code is missing an extra ) after the trans-func.
I'm using the Megaparsec package which provides an easy way to work with parser combinators. The author of the library has written a longer tutorial here.
The basic idea is that Parser a is the type of a value that can parse something of type a. In Text.Megaparsec there are several functions which you can use (parse, parseMaybe etc.), to "run" the parser on a "stringy" data type (e.g. String or strict/lazy Text).
When you use do notation for IO, it means "do one action after another". Similarly, you can use do notation with Parser, it means "parse this one thing, then parse the next thing".
p1 *> p2 means run the parser p1, run p2 and return the result of running p2. p1 <* p2 means run the parser p1, run p2 and return the result of running p1. You can also look up documentation on Hoogle in case you're having trouble understanding something.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NamedFieldPuns #-}
-- In practice, many of these imports would be unqualified, but I've
-- opted for explicitness for clarity.
import Control.Applicative (empty, many, some, (<*), (*>))
import Control.Exception (try, IOException)
import Data.Maybe (fromMaybe)
import Data.Set (Set)
import Data.Text (Text)
import qualified Data.Set as Set
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import qualified Text.Megaparsec as MP
import qualified Text.Megaparsec.Char as MPC
import qualified Text.Megaparsec.Char.Lexer as MPCL
type Q = Text
type E = Char
data EntryDFA = EntryDFA
{ state :: Set Q
, alpha :: Set E
, delta :: Set (Q,E,Q)
, start :: Q
, final :: Set Q
} deriving Show
inputFile = "foo.sexp"
main :: IO ()
main = do
-- read file and check for exception instead of checking if
-- it exists and then trying to read it
result <- try (TIO.readFile inputFile)
case result of
Left e -> print (e :: IOException)
Right txt -> do
case MP.parse dfaParser inputFile txt of
Left e -> print e
Right dfa -> print dfa
type Parser = MP.Parsec () Text
-- There are no comments in the S-exprs, so leave those empty
spaceConsumer :: Parser ()
spaceConsumer = MPCL.space MPC.space1 empty empty
symbol :: Text -> Parser Text
symbol txt = MPCL.symbol spaceConsumer txt
parens :: Parser a -> Parser a
parens p = MP.between (symbol "(") (symbol ")") p
setP :: Ord a => Parser a -> Parser (Set a)
setP p = do
items <- parens (p `MP.sepBy1` (symbol ","))
return (Set.fromList items)
pair :: Parser a -> Parser b -> Parser (a, b)
pair p1 p2 = parens $ do
x1 <- p1
x2 <- symbol "," *> p2
return (x1, x2)
stateP :: Parser Text
stateP = do
c <- MPC.letterChar
cs <- many MPC.alphaNumChar
return (T.pack (c:cs))
dfaParser :: Parser EntryDFA
dfaParser = do
() <- spaceConsumer
(_, state) <- pair (symbol "states") (setP stateP)
(_, alpha) <- pair (symbol "alpha") (setP alphaP)
(_, delta) <- pair (symbol "trans-func") (setP transFuncP)
(_, start) <- pair (symbol "start") valP
(_, final) <- pair (symbol "final") (setP valP)
return (EntryDFA {state, alpha, delta, start, final})
where
alphaP :: Parser Char
alphaP = MPC.letterChar <* spaceConsumer
transFuncP :: Parser (Text, Char, Text)
transFuncP = parens $ do
s1 <- stateP
a <- symbol "," *> alphaP
s2 <- symbol "," *> stateP
return (s1, a, s2)
valP :: Parser Text
valP = fmap T.pack (some MPC.digitChar)

Related

Parser written in Haskell not working as intended

I was playing around with Haskell's parsec library. I was trying to parse a hexadecimal string of the form "#x[0-9A-Fa-f]*" into an integer. This the code I thought would work:
module Main where
import Control.Monad
import Numeric
import System.Environment
import Text.ParserCombinators.Parsec hiding (spaces)
parseHex :: Parser Integer
parseHex = do
string "#x"
x <- many1 hexDigit
return (fst (head (readHex x)))
testHex :: String -> String
testHex input = case parse parseHex "lisp" input of
Left err -> "Does not match " ++ show err
Right val -> "Matched" ++ show val
main :: IO ()
main = do
args <- getArgs
putStrLn (testHex (head args))
And then I tried testing the testHex function in Haskell's repl:
GHCi, version 8.6.5: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( src/Main.hs, interpreted )
Ok, one module loaded.
*Main> testHex "#xcafebeef"
"Matched3405692655"
*Main> testHex "#xnothx"
"Does not match \"lisp\" (line 1, column 3):\nunexpected \"n\"\nexpecting hexadecimal digit"
*Main> testHex "#xcafexbeef"
"Matched51966"
The first and second try work as intended. But in the third one, the string is matching upto the invalid character. I do not want the parser to do this, but rather not match if any digit in the string is not a valid string. Why is this happening, and how do if fix this?
Thank you!

You need to place eof at the end.
parseHex :: Parser Integer
parseHex = do
string "#x"
x <- many1 hexDigit
eof
return (fst (head (readHex x)))
Alternatively, you can compose it with eof where you use it if you want to reuse parseHex in other places.
testHex :: String -> String
testHex input = case parse (parseHex <* eof) "lisp" input of
Left err -> "Does not match " ++ show err
Right val -> "Matched" ++ show val

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)

The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

parsec produce strange errors when trying to handle Maybe as ParseError

If I have this code :
import Text.Parsec
ispositive a = if (a<0) then Nothing else (Just a)
f a b = a+b
parserfrommaybe :: String -> (Maybe c) -> Parsec a b c
parserfrommaybe msg Nothing = fail msg
parserfrommaybe _ (Just res) = return res
(<!>) :: Parsec a b (Maybe c) -> String -> Parsec a b c
(<!>) p1 msg = p1 >>= (parserfrommaybe msg)
integermaybeparser = (ispositive <$> integer) <!> "negative numbers are not allowed"
testparser = f <$> (integermaybeparser <* whiteSpace) <*> integermaybeparser
when I test testparser with input like this "-1 3" it gives :
Left (line 1, column 4):
unexpected "3"
negative numbers are not allowed
I expected it to give error on Column 1 and give the error message without the sentence "unexpected 3" but it seems parsec continued parsing.
Why did this happen ? and how to make parsec give the error message I expect ?

I have found the solution, the cause of is that the first parser gets run and consumes input even when failing.
The solution was to use lookAhead like this:
(<!>) :: (Monad m,Stream a m t) => ParsecT a b m (Maybe c) -> String -> ParsecT a b m c
(<!>) p1 msg = ((lookAhead p1) >>= (parserfrommaybe msg)) *> (p1 >>= (parserfrommaybe msg))
if lookAhead p1 returns Nothing then the first argument of *> would fail without consuming input because of lookAhead, now if lookAhead p1 returns Just res then it would succeed again without consuming input and the result would be obtained from the second argument of *>.
ofcourse I had to change parserfrommaybe type annotation to (Monad m) => String -> (Maybe c) -> ParsecT a b m c to satisfy ghc.

Parsing simple molecule names with Attoparsec

I find it extremely difficult to learn how to use Attoparsec, because the documentation is really just an API documentation and there are basically no tutorials around (except the one from FPComplete). If you know other places where I can learn Attoparsec, that'd be great.
I have to parse simple molecule names, in the following format: NaCl, CO2, H2O, HCN, H2O2.
An element name is an uppercase letter optionally followed by a lowercase one (I'm not considering those elements with a symbol longer than 2 characters).
An element can be followed by a number (that would be the subscript in a formula).
New version (thanks to Mark's and Tarmil's suggestions), which compiles but does not parse:
module Chem
where
import Data.Text (Text, pack)
import Control.Applicative ((<*>), (<$>))
import Data.Attoparsec.Text
data Element = Element String Int deriving (Eq, Ord, Show)
type Molecule = [Element]
parseString :: String -> Result Molecule
parseString = parse (many' parseElement) . pack
parseElement :: Parser Element
parseElement = do
el <- (++) <$> pClass "A-Z" <*> option "" (pClass "a-z")
n <- option 1 decimal
return $ Element el n
pClass :: String -> Parser String
pClass cls = (\c -> [c]) <$> satisfy (inClass cls)
Any suggestion is appreciated.
EDIT: I managed to get it running. Basically, a Partial continuation was returned, and to finish the parsing it's necessary to feed the parser with an empty bytestring. So the correct parseString would be:
parseString = flip feed empty . parse (many' parseElement) . pack
where empty is Data.Text.empty. However, since I don't need incremental parsing there is the useful function parseOnly, which does not wait for more input and returns an Either.
With that in mind, I rewrote the code like this (it works now):
module Chem
where
import Data.Text (Text, pack)
import Control.Applicative ((<*>), (<$>))
import Data.Attoparsec.Text
data Element = Element String Int deriving (Eq, Ord, Show)
type Molecule = [Element]
parseString :: String -> Either String Molecule
parseString = parseOnly (many' parseElement) . pack
parseElement :: Parser Element
parseElement = do
el <- (++) <$> pClass "A-Z" <*> option "" (pClass "a-z")
n <- option 1 decimal
return $ Element el n
pClass :: String -> Parser String
pClass cls = (\c -> [c]) <$> satisfy (inClass cls)

You have two problems in the letters parsing part:
inClass is not a parser, it is a function that is meant to be passed to satisfy.
<*> has type Parser (a -> b) -> Parser a -> Parser b, so the parser on the left should return a function. Typically, it is used like this:
pf <$> p1 <*> p2 <*> ... <*> pn
where pf is a function with n arguments.
So here you probably want something like this:
-- parse a character in the given class, and transform it to a single-char string
pClass cls = (\c -> [c]) <$> satisfy (inClass cls)
-- ...
el <- ((++) <$> pClass "A-Z" <*> pClass "a-z") <|> pClass "A-Z"
-- ...
I think this would be enhanced by using option, instead of duplicating the A-Z parser:
el <- (++) <$> pClass "A-Z" <*> option "" (pClass "a-z")

Haskell Parser not working

I am trying to understand Parsers. Therefore I have created my own parser. Unfortunately it does not work. Why?
type Parser a = String -> [(a, String)]
preturn :: a -> Parser a
preturn t = \inp -> [(t,inp)]
pfailure :: Parser a
pfailure = \inp -> []
pitem :: Parser Char
pitem = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
parse :: Parser a -> Parser a
--parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
{-
combine :: Parser a -> Parser b -> Parser (a,b)
combine p1 p2 = \inp -> p2 t output
where
p1 inp = ([
-}
-- firstlast :: Parser (Char,Char)
firstlast = do
x <- pitem
z <- pitem
y <- pitem
preturn (x,y)
another = do
x <- pitem
y <- pitem
Firstlast is supposed to take a string and return the first and third character. Unfortunately, it returns odd values, and it does not accept its type (Parser (Char,Char))
For example,
*Main> firstlast "abc"
[(([('a',"bc")],[('a',"bc")]),"abc")]
What should happen is:
*Main> firstlast "abc"
[("ac",[])]

Please use code that compiles. Your another function does not.
What's the problem?
Your code for firstlast and another makes use of do-notation. And the way you're using pitem here, it looks as if you're expecting Parser to be a monad. But it isn't, at least not in the way you expect it to be.
There is a monad instance pre-defined which make GHC think that Parser is a monad, namely
instance Monad ((->) r) where
return = const
f >>= k = \ r -> k (f r) r
What this instance says is that, for any type r the function type r -> ... can be considered a monad, namely by distributing the parameter everywhere. So returning something in this monad amounts to producing a value ignoring the parameter of type r, and binding a value means that you take r and pass it on both to the left and right computation.
This is not what you want for a parser. The input string will be distributed to all computations. So each pitem will operate on the original input string. Furthermore, as
pitem :: String -> [(Char, String)]
the result of your monadic computation will be of type [(Char, String)], so x and y are both of this type. That's why you get the result
[(([('a',"bc")],[('a',"bc")]),"abc")]
You're calling pitem three times on the same input string. You're putting two results in a pair, and you're preturn-ing the whole thing.
How to fix it?
You need to define your own monad instance for the Parser type. You cannot do that directly, because Parser is a type synonym, and type synonyms cannot be partially applied,
so you cannot write
instance Monad Parser where
...
Instead, you have to wrap Parser in a new datatype or newtype:
newtype Parser a = Parser { parse :: String -> [(a, String)] }
This gives you a constructor Parser and a function parse to convert between the unwrapped and wrapped parser types:
Parser :: String -> [(a, String)] -> Parser a
parse :: Parser a -> String -> [(a, String)]
This implies you'll have to adapt your other functions. For example, preturn becomes
preturn :: a -> Parser a
preturn t = Parser (\inp -> [(t,inp)])
Change pfailure and pitem similarly. Then, you have to define the Monad instance:
instance Monad Parser where
return = preturn
(>>=) = ... -- to be completed by you
The function (>>=) is not contained in your code above. You'll want to implement the behaviour that the input is passed to the first parser, and for every result of that, the result and the remaining input are passed to the second argument of (>>=). Once this is done, a call to parse firstlast "abc" will have the following result:
[(('a','c'),"")]
which isn't quite what you want in your question, but I believe it's what you're actually after.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How do I parse S-expressions into a data structure in Haskell? - parsing

Related

Parser written in Haskell not working as intended

Haskell : Operator Parser keeps going to undefined rather than inputs

parsec produce strange errors when trying to handle Maybe as ParseError

Parsing simple molecule names with Attoparsec

Haskell Parser not working

Categories

Resources