Haskell (Parsing) code can't understand how it works - parsing

openBr = char '['
closeBr = char ']'
openPn = char '('
closePn = char ')'
star = char '*'
myParser =
(many1 star >>=
\vs -> myParser >>=
\x -> return (x+length vs)
) +++
(openBr >>
myParser >>=
\c -> closeBr >>
myParser >>=
\d -> return (c+d)
) +++
(openPn >>
myParser >>=
\c -> closePn >>
myParser >>=
\d -> return (c+d)
) +++
return 0
parse myParser "*(***[*(**)]*)*"
-- outputs ([9,""])
parse myParser "*(***[*(**]*)*"
-- outputs ([1, "(***[*(**]*)*"])
when there is a matching bracket and parentheses it returns the number of stars however it returns like the second output. I don't understand how the code works. Can someone explain it for me?

First of all, when writing monadic Haskell code like this, It is idiomatic to use do notation. This reduces noise and makes your intention clearer.
Rewritten using do notation, your parser would look like this:
myParser =
-- run of stars
(do
vs <- many1 star
x <- myParser
return (x+length vs)
) +++
-- brackets
(do
openBr
c <- myParser
closeBr
d <- myParser
return (c+d)
) +++
-- parentheses
(do
openPn
c <- myParser
closePn
d <- myParser
return (c+d)
) +++
-- fail
return 0
Now, when you parse the second example, the parser correctly parses the first star, and tots the total up to 1. It then sees the open bracket, so it chooses the bracket choice.
The brackets are not balanced however, so the bracket parser fails, despite being able to parse a some of the characters following the open bracket. This causes the failure to bubble upward and fail the overall parse. When this happens, the parse function returns the return value accumulated to date, i.e. the number of stars before the failure, as well as the remainder of the string which failed to parse.
What you want to happen in this situation is for the parse to fail and return 0. Since your parsing library doesn't automatically backtrack, you'll need to explicitly enable backtracking by wrapping the subparsers in your library's equivalent of the try combinator.
Happy Haskelling!

Related

Passing argument to a ReadP Parser in Haskell

I am trying to create a parser from scratch in Haskell. I have problems passing a string as an argument to a function that is already part of a do block in which the parsing occurs. Why does the following Minimal viable example code return [] and not 4 as expected.
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr = (do
pv <- pOpHelper
spaces
str <- string pv
return str
)
pOpHelper :: Parser String
pOpHelper = (do
e1 <- munch isDigit
return e1
)
I am of course interested in returning a processed version of whatever string pv returns. However I can't understand why the current setup wouldn't return anything besides [] on parseString "4" since calling just pOpHelper wihtout pExpr seems to work.
Edit
I think I have located the 'bug' to be part of the string function. I had a closer look at it here but I can't see from the documentation why it shouldn't work in the above. But the above code is narrowed down to the parts that produce the unintended outputs as specified.
EDIT EDIT
I have now narrowed the problem down even further. It has to do with how 'consumption' works for the parser. The problem is that if I give it parseString "4" the string pv expects the "4" that is returned by pv, but it will still be parsing the next characters on which munch isDigit is no longer satisfies. This means that it will only return [("4","")] rather than [] if the input is parseString "4 4", and only if the spaces has been added to the do-clause in pExpr.
But how can I work around this and avoid 'consuming' the string that I put as input. Is there a way to use look for instance, in the above documentation.
As pointed out in the comments below I am interested in transforming whatever is the input to pOpHelper and then passing its output to functions (in a recursion) that is part of the parent parser-function called. But how can I do it without consuming the input with pOpHelper first such that the following example would return str on input of "4":
pExpr = (do
pv <- pOpHelper
--spaces
str <- string pv
if str == "(4)" then return str -- do stuff!
else pfail
)
pOpHelper :: Parser String
pOpHelper = (do
e1 <- munch isDigit
return ( "(" ++ e1 ++ ")" )
)

Skip everything until a successful parse

I'd like to parse all days from a text like this:
Ignore this
Also this
2019-09-05
More to ignore
2019-09-06
2019-09-07
Using Trifecta, I've defined a function to parse a day:
dayParser :: Parser Day
dayParser = do
dayString <- tillEnd
parseDay dayString
tillEnd :: Parser String
tillEnd = manyTill anyChar (try eof <|> eol)
parseDay :: String -> Parser Day
parseDay s = maybe failure return dayMaybe
where
dayMaybe = parseTime' dayFormat s
failure = unexpected $ "Failed to parse date. Expected format: " ++ dayFormat
-- %-m makes the parser accept months consisting of a single digit
dayFormat = "%Y-%-m-%-d"
eol :: Parser ()
eol = char '\n' <|> char '\r' >> return ()
-- "%Y-%-m-%-d" for example
type TimeFormat = String
-- Given a time format and a string, parses the string to a time.
parseTime' :: (Monad m, ParseTime t) => TimeFormat -> String -> m t
-- True means that the parser tolerates whitespace before and after the date
parseTime' = parseTimeM True defaultTimeLocale
Parsing a day this way works. What I'm having trouble with is ignoring anything in the text that's not a day.
The following can't work since it assumes the number of text blocks that aren't a day:
daysParser :: Parser [Day]
daysParser = do
-- Ignore everything that's not a day
_ <- manyTill anyChar $ try dayParser
days <- many $ token dayParser
_ <- manyTill anyChar $ try dayParser
-- There might be more days after this...
return days
I reckon there's a straightforward way to express this with Trifecta but I can't seem to find it.
Here's the whole module including an example text to parse:
{-# LANGUAGE QuasiQuotes #-}
module DateParser where
import Text.RawString.QQ
import Data.Time
import Text.Trifecta
import Control.Applicative ( (<|>) )
-- "%Y-%-m-%-d" for example
type TimeFormat = String
dayParser :: Parser Day
dayParser = do
dayString <- tillEnd
parseDay dayString
tillEnd :: Parser String
tillEnd = manyTill anyChar (try eof <|> eol)
parseDay :: String -> Parser Day
parseDay s = maybe failure return dayMaybe
where
dayMaybe = parseTime' dayFormat s
failure = unexpected $ "Failed to parse date. Expected format: " ++ dayFormat
-- %-m makes the parser accept months consisting of a single digit
dayFormat = "%Y-%-m-%-d"
eol :: Parser ()
eol = char '\n' <|> char '\r' >> return ()
-- Given a time format and a string, parses the string to a time.
parseTime' :: (Monad m, ParseTime t) => TimeFormat -> String -> m t
-- True means that the parser tolerates whitespace before and after the date
parseTime' = parseTimeM True defaultTimeLocale
daysParser :: Parser [Day]
daysParser = do
-- Ignore everything that's not a day
_ <- manyTill anyChar $ try dayParser
days <- many $ token dayParser
_ <- manyTill anyChar $ try dayParser
-- There might be more days after this...
return days
test = parseString daysParser mempty text1
text1 = [r|
Ignore this
Also this
2019-09-05
More to ignore
2019-09-06
2019-09-07|]
There are three large problems here.
First, the way you're defining dayParser, it's always trying to parse the rest of the text as a date. For example, if your input text is "2019-01-01 foo bar", then dayParser would first consume the whole string, so that dayString == "2019-01-01 foo bar", and then will try to parse that string as a date. Which, of course, would fail.
In order to have a saner behavior, you could only bite off the beginning of the string that kinda looks like a date and try to parse that, like:
dayParser =
parseDay =<< many (digit <|> char '-')
This implementation bites off the beginning of the input consisting of digits and dashes, and tries to parse that as a date.
Note that this is a quick-n-dirty implementation. It is imprecise. For example, this implementation would accept input like "2019-01-0123456" and try to parse that as a date, and of course will fail. From your question, it is not clear whether you'd want to still parse 2019-01-01 and leave the rest, or whether you want to not consider that a proper date. If you wanted to be super-precise about this, you could specify the exact format as precisely as you want, e.g.:
dayParser = do
y <- count 4 digit
void $ char '-'
m <- try (count 2 digit) <|> count 1 digit
void $ char '-'
d <- try (count 2 digit) <|> count 1 digit
parseDay $ y ++ "-" ++ m ++ "-" ++ d
This implementation expects exactly the format of the date.
Second, there is a logical problem: your daysParser tries to first parse some garbage, then parse many days, and then parse some garbage again. This logic does not admit a case where the many dates have some garbage between them.
Third problem is much more tricky. You see, the way the try combinator works - if the parser fails, then try will roll back the input position, but if the parser succeeds, then the input remains consumed! This means that you cannot use try as a zero-consumption lookahead, the way you're trying to do in manyTill anyChar $ try dayParser. Such a parser will parse until it finds a date, and then it will consume the date, leaving nothing for the next parser and causing it to fail.
I will illustrate with a simpler example. Consider this:
> parseString (many (char 'a')) mempty "aaa"
Success "aaa"
Cool, it parses three 'a's. Now let's add a try at the beginning:
> parseString (try (char 'b') *> many (char 'a')) mempty "aaa"
Success "aaa"
Awesome, this still works: the try fails, and then we parse three 'a's as before.
Now let's change the try from 'b' to 'a':
> parseString (try (char 'a') *> many (char 'a')) mempty "aaa"
Success "aa"
Look what happened: the try has consumed the first 'a', leaving only two to be parsed by many.
We can even extend it to more fully resemble your approach:
> p = manyTill anyChar (try (char 'a')) *> many (char 'a')
> parseString p mempty "aaa"
Success "aa"
> parseString p mempty "cccaaa"
Success "aa"
See what happens? manyTill correctly skips all the 'c's up to the first 'a', but then it also consumes that first 'a'!
There appears to be no sane way (that I see) to have a zero-consumption lookahead like this. You always have to consume the first successful hit.
If I had this problem, I would probably resort to recursion: parsing chars one by one, at every step looking if I can get a day, and concatenating in a list. Something like this:
data WhatsThis = AChar Char | ADay Day | EOF
daysParser = do
r <- (ADay <$> dayParser) <|> (AChar <$> anyChar) <|> (EOF <$ eof)
case r of
ADay d -> do
rest <- daysParser
pure $ d : rest
AChar _ ->
daysParser
EOF ->
pure []
It tries to parse a day, and if that fails, just skips a char, unless there are no more chars. If day parsing succeeded, it calls itself recursively, then prepends the day to the result of the recursive call.
Note that this approach is not very composable: it always consumes everything till the end of the input. If you want to compose it with something else, you may want consider replacing eof with a parameter:
daysParser stop = do
r <- (ADay <$> dayParser) <|> (AChar <$> anyChar) <|> (EOF <$ stop)
...

Parser in Haskell for Triples

I am currently doing an assignment about Parsing in Haskell, but I am struggling with some of the basics.
Assignment :
I am supposed to create a function which parses a string into a list of Triples.
So that:
A, B, C
,E ,D
would result in
Triples [("A","B","C"), ("A","E","D")]
The input string is going to include ;\n as an indication for the beginng of a new Triple. The string is going to end with a dot.
The elements of the Triples can be letters or digits or combination,
e.g. abc, a, 1, abc121.
Therefore,
"a,b,c;\n d,e;\n f,g;\n h,i."
would result in:
Triples [("a","b","c"),("a","d","e"),("a","f","g"),("a","h","i")]
My current solution:
parseTriplesD :: Parser Triples
parseTriplesD = parseTriples
>>= \rs -> return (Triples rs)
This function is pretty simple and correct. Takes the string and returns a object of the newtype Triples with the List of Triples created by parseTriples.
parseTriples :: Parser [Triple]
parseTriples = parseTriple
>>= \r -> ((string ";\n" >> parseTriples >>= \rs -> return (r:rs))
P.<|>(return[r]))
This function needs some work. My idea is that I use another function which creates a Triple with tree Elements of the input string, ignores the /n and recursivly calls it self while adding the created triples to a return list. When this does not work because it can only create one Triple, it returns a list with the Triple.
I somehow need to create the first Triple, and then use first element of this triple as the first element of the other ones.
Question 1
How do I create the first Triple and use the first Elements of the Triple for the other Triples?
parseTriple :: Parser Triple
parseTriple = P.many (letter<|>digit) >>= \a -> P.char ','
>> P.many (letter<|>digit)>>= \b -> P.char ','
>> P.many (letter<|>digit)>>= \c -> return ((a,b,c))
This function is pretty simple but I am not sure if its correct.
My idea is that it takes the first couple of characters of the string which are either a letter or a digit, up until the comma "," , and saves these charcters in a.
It is repeated 3 times, and the creates and returns a Triple with the three elements.
Question 2
How do I take only a few characters (which are either a letter or a digit EDIT: Or A SPACE Character) of the string up until the comma?
Is P.many (letter<|>digit) correct?
What we are given:
The Triples data structue:
newtype Triples = Triples [Triple] deriving (Show,Eq)
type Triple = (String, String, String)
Imports:
import Test.HUnit (runTestTT,Test(TestLabel,TestList),(~?=))
import qualified Text.Parsec as P (char,runP,noneOf,many,(<|>),eof)
import Text.ParserCombinators.Parsec
import Text.Parsec.String
import Text.Parsec.Char
import Data.Maybe
Test cases
runParsec :: Parser a -> String -> Maybe a
runParsec parser input = case P.runP parser () "" input of
Left _ -> Nothing
Right a -> Just a
-- | Tests the implementations of 'parseScore'.
main :: IO ()
main = do
testresults <- runTestTT tests
print testresults
-- | List of tests for 'parseScore'.
tests :: Test
tests = TestLabel "parseScoreTest" (TestList [
runParsec parseTriplesD "0,1,2;\n2,3." ~?= Just (Triples [("0","1","2"),("0","2","3")]),
runParsec parseTriplesD "a,bcde ,23." ~?= Just (Triples [("a","bcde ","23")]),
runParsec parseTriplesD "a,b,c;\n d,e;\n f,g;\n h,i." ~?= Just (Triples [("a","b","c"),("a","d","e"),("a","f","g"),("a","h","i")]),
runParsec parseTriplesD "a,bcde23." ~?= Nothing,
runParsec parseTriplesD "a,b,c;d,e;f,g;h,i." ~?= Nothing,
runParsec parseTriplesD "a,b,c;\nd;\nf,g;\nh,i." ~?= Nothing
])
What you could do is:
Parse the first character
Parse a list of pairs
Add the first character to each of the pairs to create triples
Using do notation will make your code more readable.
You can use alphaNum as a shorthand for letter <|> digit.
parseTriplesD :: Parser Triples
parseTriplesD = Triples <$> parseTriples
parseTriples :: Parser [Triple]
parseTriples = do
a <- parseString
char ','
pairs <- parsePair `sepBy1` string ";\n"
char '.'
eof
return (map (\(b, c) -> (a, b, c)) pairs)
parsePair :: Parser (String, String)
parsePair = do
first <- parseString
char ','
second <- parseString
return (first, second)
parseString :: Parser String
parseString = many (char ' ') >> many (alphaNum <|> char ' ')

Haskell read variable name

I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.

Haskell: if-then-else parse error

I'm working on an instance of Read ComplexInt.
Here's what was given:
data ComplexInt = ComplexInt Int Int
deriving (Show)
and
module Parser (Parser,parser,runParser,satisfy,char,string,many,many1,(+++)) where
import Data.Char
import Control.Monad
import Control.Monad.State
type Parser = StateT String []
runParser :: Parser a -> String -> [(a,String)]
runParser = runStateT
parser :: (String -> [(a,String)]) -> Parser a
parser = StateT
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = parser $ \s -> case s of
[] -> []
a:as -> [(a,as) | f a]
char :: Char -> Parser Char
char = satisfy . (==)
alpha,digit :: Parser Char
alpha = satisfy isAlpha
digit = satisfy isDigit
string :: String -> Parser String
string = mapM char
infixr 5 +++
(+++) :: Parser a -> Parser a -> Parser a
(+++) = mplus
many, many1 :: Parser a -> Parser [a]
many p = return [] +++ many1 p
many1 p = liftM2 (:) p (many p)
Here's the given exercise:
"Use Parser to implement Read ComplexInt, where you can accept either the simple integer
syntax "12" for ComplexInt 12 0 or "(1,2)" for ComplexInt 1 2, and illustrate that read
works as expected (when its return type is specialized appropriately) on these examples.
Don't worry (yet) about the possibility of minus signs in the specification of natural
numbers."
Here's my attempt:
data ComplexInt = ComplexInt Int Int
deriving (Show)
instance Read ComplexInt where
readsPrec _ = runParser parseComplexInt
parseComplexInt :: Parser ComplexInt
parseComplexInt = do
statestring <- getContents
case statestring of
if '(' `elem` statestring
then do process1 statestring
else do process2 statestring
where
process1 ststr = do
number <- read(dropWhile (not(isDigit)) ststr) :: Int
return ComplexInt number 0
process2 ststr = do
numbers <- dropWhile (not(isDigit)) ststr
number1 <- read(takeWhile (not(isSpace)) numbers) :: Int
number2 <- read(dropWhile (not(isSpace)) numbers) :: Int
return ComplexInt number1 number2
Here's my error (my current error, as I'm sure there will be more once I sort this one out, but I'll take this one step at time):
Parse error in pattern: if ')' `elem` statestring then
do { process1 statestring }
else
do { process2 statestring }
I based my structure of the if-then-else statement on the structure used in this question: "parse error on input" in Haskell if-then-else conditional
I would appreciate any help with the if-then-else block as well as with the code in general, if you see any obvious errors.
Let's look at the code around the parse error.
case statestring of
if '(' `elem` statestring
then do process1 statestring
else do process2 statestring
That's not how case works. It's supposed to be used like so:
case statestring of
"foo" -> -- code for when statestring == "foo"
'b':xs -> -- code for when statestring begins with 'b'
_ -> -- code for none of the above
Since you're not making any sort of actual use of the case, just get rid of the case line entirely.
(Also, since they're only followed by a single statement each, the dos after then and else are superfluous.)
You stated you were given some functions to work with, but then didn't use them! Perhaps I misunderstood. Your code seems jumbled and doesn't seem to achieve what you would like it to. You have a call to getContents, which has type IO String but that function is supposed to be in the parser monad, not the io monad.
If you actually would like to use them, here is how:
readAsTuple :: Parser ComplexInt
readAsTuple = do
_ <- char '('
x <- many digit
_ <- char ','
y <- many digit
_ <- char ')'
return $ ComplexInt (read x) (read y)
readAsNum :: Parser ComplexInt
readAsNum = do
x <- many digit
return $ ComplexInt (read x) 0
instance Read ComplexInt where
readsPrec _ = runParser (readAsTuple +++ readAsNum)
This is fairly basic, as strings like " 42" (ones with spaces) will fail.
Usage:
> read "12" :: ComplexInt
ComplexInt 12 0
> read "(12,1)" :: ComplexInt
ComplexInt 12 1
The Read type-class has a method called readsPrec; defining this method is sufficient to fully define the read instance for the type, and gives you the function read automatically.
What is readsPrec?
readsPrec :: Int -> String -> [(a, String)].
The first parameter is the precedence context; you can think of this as the precedence of the last thing that was parsed. This can range from 0 to 11. The default is 0. For simple parses like this you don't even use it. For more complex (ie recursive) datatypes, changing the precedence context may change the parse.
The second parameter is the input string.
The output type is the possible parses and string remaining a parse terminates. For example:
>runStateT (char 'h') "hello world"
[('h',"ello world")]
Note that parsing is not-deterministic; every matching parse is returned.
>runStateT (many1 (char 'a')) "aa"
[("a","a"),("aa","")]
A parse is considered successful if the return list is a singleton list whose second value is the empty string; namely: [(x, "")] for some x. Empty lists, or lists where any of the remaining strings are not the empty string, give the error no parse and lists with more than one value give the error ambiguous parse.

Resources