How to Parse a floating number in Haskell (Parsec.Combinator) - parsing

I'm trying to parse a string in haskell using Parsec.Combinator library. But I can't find how to parse a floating value. My function only reads integer (with digit, but digit only parses a integer value).
parsePoint :: Parser Point
parsePoint = do
string "Point"
x <- many1 digit
char ','
y <- many1 digit
return $ (Point (read x) (read y))
I searched for Text.Parsec.Number library but didn't find examples to how to use.
thanks for reading

There are several Parsers that can parse floats, for example floating :: (Floating f, Stream s m Char) => ParsecT s u m f. The Parser is just an alias for a special case of Parsec, depending on which module you use. For example type Parser = Parsec Text ().
If your Point thus accepts for example two Floats, you can use the parsec3-numbers package and work with:
import Text.Parsec.Char(char, string)
import Text.Parsec.Number(floating)
import Text.Parsec.Text(Parser)
data Point = Point Float Float
parsePoint :: Parser Point
parsePoint = do
string "Point "
x <- floating
char ','
y <- floating
return (Point x y)
This then can parse, in this case a string "Point 3.14,2.718"

Related

Parse letter or number with Parsec

I am trying to write a parser for strings such as x, A (i.e. single letters), 657 and 0 (i.e. integer positive numbers).
Here is the code I wrote.
import Text.Parsec
data Expression = String String | Number Int
value = letter <|> many1 digit
However I get the following error.
Couldn't match type ‘[Char]’ with ‘Char’
How to convert Char -> String inside the parser?
What should the type annotation be for value ?
letter parses just a single letter and returns a Char. You want to parse a String, namely [Char] (it's the same thing), so I guess you want to parse many letter?
But if you want to parse just a single letter as a String you can take advantage of the fact that Parsec _ _ has a Functor instance in order to map over its result and pack it in a list:
value :: Parsec s u String
value = fmap (:[]) letter <|> many1 digit
After the edit I guess you want to parse the Expression you have presented to us, so you will need some more fancy fmapping to wrap the results in proper constructors:
value :: Parsec s u Expression
value = fmap (String . (:[])) letter
<|> fmap (Number . read) (many1 digit)

Why Parsec's sepBy stops and does not parse all elements?

I am trying to parse some comma separated string which may or may not contain a string with image dimensions. For example "hello world, 300x300, good bye world".
I've written the following little program:
import Text.Parsec
import qualified Text.Parsec.Text as PS
parseTestString :: Text -> [Maybe (Int, Int)]
parseTestString s = case parse dimensStringParser "" s of
Left _ -> [Nothing]
Right dimens -> dimens
dimensStringParser :: PS.Parser [Maybe (Int, Int)]
dimensStringParser = (optionMaybe dimensParser) `sepBy` (char ',')
dimensParser :: PS.Parser (Int, Int)
dimensParser = do
w <- many1 digit
char 'x'
h <- many1 digit
return (read w, read h)
main :: IO ()
main = do
print $ parseTestString "300x300,40x40,5x5"
print $ parseTestString "300x300,hello,5x5,6x6"
According to optionMaybe documentation, it returns Nothing if it can't parse, so I would expect to get this output:
[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing, Just (5,5), Just (6,6)]
but instead I get:
[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing]
I.e. parsing stops after first failure. So I have two questions:
Why does it behave this way?
How do I write a correct parser for this case?
In order to answer this question, it's handy to take a piece of paper, write down the input, and act as a dumb parser.
We start with "300x300,hello,5x5,6x6", our current parser is optionMaybe .... Does our dimensParser correctly parse the dimension? Let's check:
w <- many1 digit -- yes, "300"
char 'x' -- yes, "x"
h <- many1 digit -- yes, "300"
return (read w, read h) -- never fails
We've successfully parsed the first dimension. The next token is ,, so sepBy successfully parses that as well. Next, we try to parse "hello" and fail:
w <- many1 digit -- no. 'h' is not a digit. Stop
Next, sepBy tries to parse ,, but that's not possible, since the next token is a 'h', not a ,. Therefore, sepBy stops.
We haven't parsed all the input, but that's not actually necessary. You would get a proper error message if you've used
parse (dimensStringParser <* eof)
Either way, if you want to discard anything in the list that's not a dimension, you can use
dimensStringParser1 :: Parser (Maybe (Int, Int))
dimensStringParser1 = (Just <$> dimensParser) <|> (skipMany (noneOf ",") >> Nothing)
dimensStringParser = dimensStringParser1 `sepBy` char ','
I'd guess that optionMaybe dimensParser, when fed with input "hello,...", tries dimensParser. That fails, so optionMaybe returns success with Nothing, and consumes no portion of the input.
The last part is the crucial one: after Nothing is returned, the input string to be parsed is still "hello,...".
At that point sepBy tries to parse char ',', which fails. So, it deduces that the list is over, and terminates the output list, without consuming any more input.
If you want to skip other entities, you need a "consuming" parser that returns Nothing instead of optionMaybe. That parser, however, need to know how much to consume: in your case, until the comma.
Perhaps you need some like (untested)
( try (Just <$> dimensParser)
<|> (noneOf "," >> return Nothing))
`sepBy` char ','

Please explain the behavior of this Parsec permutation parser

Why does this Parsec permutation parser not parse b?
p :: Parser (String, String)
p = permute (pair
<$?> ("", pa)
<|?> ("", pb))
where pair a b = (a, b)
pa :: Parser String
pa = do
char 'x'
many1 (char 'a')
pb :: Parser String
pb = do
many1 (char 'b')
λ> parseTest p "xaabb"
("aa","bb") -- expected result, good
λ> parseTest p "aabb"
("","") -- why "" for b?
Parser pa is configured as optional via <$?> so I don't understand why its failing has impacted the parsing of b. I can change it to optional (char 'x') to get the expected behavior, but I don't understand why.
pa :: Parser String
pa = do
optional (char 'x')
many1 (char 'a')
pb :: Parser String
pb = do
optional (char 'x')
many1 (char 'b')
λ> parseTest p "xaaxbb"
parse error at (line 1, column 2):
unexpected "a"
expecting "b"
λ> parseTest p "xbbxaa"
("aa","bb")
How can both input orderings be supported when we have identical shared prefix "x"?
I also don't understand the impact that consumption of the optional "x" is having on the parse behavior:
pb :: Parser String
pb = do
try px -- with this try x remains unconsumed and "aa" gets parsed
-- without this try x is consumed, but "aa" isn't parsed even though "x" is optional anyway
many1 (char 'b')
px :: Parser Char
px = do
optional (char 'x')
char 'x' <?> "second x"
λ> parseTest p "xaaxbb" -- without try on px
parse error at (line 1, column 2):
unexpected "a"
expecting second x
λ> parseTest p "xaaxbb" -- with try on px
("aa","")
Why parseTest p "aabb" gives ("","")
The permutation parser tries to strip off the front of the given string prefixes that can be parsed by its constituent parsers (pa and pb in this case). Here, it will have tried to apply both pa and pb to "aabb" and failed in both cases - it never even gets around to trying to parse "bb".
Why can't both pa and pb start with optional (char 'x')
Looking at permute, you'll see it uses choice, which in turn relies on (<|>). As the documentation of (<|>) says,
This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried. This combinator is defined equal to the mplus member of the MonadPlus class and the (<|>) member of Alternative.
The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.
So when you do something like parseTest p "xbb", pa doesn't fail immediately (it consumes and 'x') and then the whole thing fails because it cannot backtrack.
How to make shared prefixes work?
As Daniel has suggested, it is best to factor out your grammar. Alternately, you can use try:
The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs
Based on what we talked about before for (<|>), you ought then to put try in front of both of optional (char 'x').
Why does this Parsec permutation parser not parse b?
Because 'a' is not a valid first character for either parser pa or parser pb.
How can both input orderings be supported when we have identical shared prefix "x"?
Shared prefixes must be factored out of your grammar; or backtracking points inserted (using try) at the cost of performance.

Make a parser ignore all redundant whitespace

Say I have a Parser p in Parsec and I want to specify that I want to ignore all superfluous/redundant white space in p. Let's for example say that I define a list as starting with "[", end with "]", and in the list are integers separated by white space. But I don't want any errors if there are white space in front of the "[", after the "]", in between the "[" and the first integer, and so on.
In my case, I want this to work for my parser for a toy programming language.
I will update with code if that is requested/necessary.
Just surround everything with space:
parseIntList :: Parsec String u [Int]
parseIntList = do
spaces
char '['
spaces
first <- many1 digit
rest <- many $ try $ do
spaces
char ','
spaces
many1 digit
spaces
char ']'
return $ map read $ first : rest
This is a very basic one, there are cases where it'll fail (such as an empty list) but it's a good start towards getting something to work.
#Joehillen's suggestion will also work, but it requires some more type magic to use the token features of Parsec. The definition of spaces matches 0 or more characters that satisfies Data.Char.isSpace, which is all the standard ASCII space characters.
Use combinators to say what you mean:
import Control.Applicative
import Text.Parsec
import Text.Parsec.String
program :: Parser [[Int]]
program = spaces *> many1 term <* eof
term :: Parser [Int]
term = list
list :: Parser [Int]
list = between listBegin listEnd (number `sepBy` listSeparator)
listBegin, listEnd, listSeparator :: Parser Char
listBegin = lexeme (char '[')
listEnd = lexeme (char ']')
listSeparator = lexeme (char ',')
lexeme :: Parser a -> Parser a
lexeme parser = parser <* spaces
number :: Parser Int
number = lexeme $ do
digits <- many1 digit
return (read digits :: Int)
Try it out:
λ :l Parse.hs
Ok, modules loaded: Main.
λ parseTest program " [1, 2, 3] [4, 5, 6] "
[[1,2,3],[4,5,6]]
This lexeme combinator takes a parser and allows arbitrary whitespace after it. Then you only need to use lexeme around the primitive tokens in your language such as listSeparator and number.
Alternatively, you can parse the stream of characters into a stream of tokens, then parse the stream of tokens into a parse tree. That way, both the lexer and parser can be greatly simplified. It’s only worth doing for larger grammars, though, where maintainability is a concern; and you have to use some of the lower-level Parsec API such as tokenPrim.

Parsing in Haskell for a simple interpreter

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.
Here is the piece of code which I am having problems with
data IntExp
= IVar Var
| ICon Int
| Add IntExp IntExp
deriving (Read, Show)
whitespace = many1 (char ' ')
parseICon :: Parser IntExp
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
parseIVar :: Parser IntExp
parseIVar =
do x <- many (letter)
prime <- string "'" <|> string ""
return (IVar (x ++ prime))
parseIntExp :: Parser IntExp
parseIntExp =
do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
return x
parseAdd :: Parser IntExp
parseAdd =
do x <- parseIntExp
whitespace
string "+"
whitespace
y <- parseIntExp
return (Add x y)
runP :: Show a => Parser a -> String -> IO ()
runP p input
= case parse p "" input of
Left err ->
do putStr "parse error at "
print err
Right x -> print x
The language is slightly more complex, but this is enough to show my problem.
So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully
runP parseAdd "5 + 5"
which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg
runP parseAdd "n + m"
This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.
So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.
I hope I've given enough info about my problem and I was clear enough.
Thanks for any help you can give me!
Your problem is actually in parseICon:
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.
And while I'm at it, since you're new to Haskell, here's some unsolicited advice:
Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.
You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.
You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.
It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.
It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.
Also, doing this sort of thing is redundant and unnecessary:
parseICon :: Parser IntExp
parseICon =
do x <- many digit
return (ICon (read x))
When all you're doing is applying a regular function to the result of a parser, you can just use fmap:
parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)
They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.
Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:
whitespace = many1 $ char ' '
parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit
parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName
parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime
parsePrime :: Parser String
parsePrime = option "" $ string "'"
parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd
parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()
parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
I'm also new to Haskell, just wondering:
will parseIntExp ever make it to parseAdd?
It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.
e.g. runP parseIntExp "3 + m"
would try parseICon, and succeed, giving
(ICon 3) instead of (Add (ICon 3) (IVar m))
Sorry if I'm being stupid here, I'm just unsure.

Resources