How to parse integer with base prefix using parsec in haskell? - parsing

I'm trying to parse an input integer string in haskell using parsec. The string might either be in decimal, octal or hexadecimal. The base is specified by a prefix of #d, #o or #x for decimal, octal and hexadecimal respectively, which is then followed by the integer. If no prefix is specified, the base is assumed to be 10. Here's what I've done so far:
parseNumber = do x <- noPrefix <|> withPrefix
return x
where noPrefix = many1 digit
withPrefix = do char '#'
prefix <- oneOf "dox"
return $ case prefix of
'd' -> many1 digit
'o' -> fmap (show . fst . head . readOct) (many1 octDigit)
'x' -> fmap (show . fst . head . readHex) (many1 hexDigit)
However, this isn't compiling and is failing with type errors. I don't quite really understand the type error and would just like help in general with this problem. Any alternative ways to solve it will also be appreciated.
Thank you for your time and help. :)
EDIT: Here's the error I've been getting.

In Megaparsec—a modern
fork of Parsec, this problem is non-existent (from
documentation of hexadecimal):
Parse an integer in hexadecimal representation. Representation of
hexadecimal number is expected to be according to Haskell report except
for the fact that this parser doesn't parse “0x” or “0X” prefix. It is
responsibility of the programmer to parse correct prefix before parsing
the number itself.
For example you can make it conform to Haskell report like this:
hexadecimal = char '0' >> char' 'x' >> L.hexadecimal
So in your case you can just define (note how it's more readable):
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
parseNumber :: Parser Integer
parseNumber = choice
[ L.decimal
, (string "o#" *> L.octal) <?> "octal integer"
, (string "d#" *> L.decimal) <?> "decimal integer"
, (string "h#" *> L.hexadecimal) <?> "hexadecimal integer" ]
Let's try the parser (note quality of error messages):
λ> parseTest' (parseNumber <* eof) ""
1:1:
|
1 | <empty line>
| ^
unexpected end of input
expecting decimal integer, hexadecimal integer, integer, or octal integer
λ> parseTest' (parseNumber <* eof) "d#3"
3
λ> parseTest' (parseNumber <* eof) "h#ff"
255
λ> parseTest' (parseNumber <* eof) "o#8"
1:3:
|
1 | o#8
| ^
unexpected '8'
expecting octal integer
λ> parseTest' (parseNumber <* eof) "o#77"
63
λ> parseTest' (parseNumber <* eof) "190"
190
Full-disclosure: I'm the author/maintainer of Megaparsec.

You have two slight errors:
One indention error (return x must be indented compared to do) and the parsers in withPrefix must not be returned, since they will return their results anyway.
parseNumber = do x <- noPrefix <|> withPrefix
return x
where noPrefix = many1 digit
withPrefix = do char '#'
prefix <- oneOf "dox"
case prefix of
'd' -> many1 digit
'o' -> fmap (show . fst . head . readOct) (many1 octDigit)
'x' -> fmap (show . fst . head . readHex) (many1 hexDigit)
This should work

Related

How to preserve factorial computing when parsing string with Parsec in Haskell?

I ran into difficulties trying resolve of factorial output calculation with Text.Parsec in Haskell. Let's have a look at some code I use so far:
import Text.Parsec hiding(digit)
import Data.Functor
type CalcParser a = CalcParsec String () a
digit :: CalcParser Char
digit = oneOf ['0'..'9']
number :: CalcParser Double
number = try calcDouble <|> calcInt
calcInt :: CalcParser Double
calcInt = read <$> many1 digit
calcDouble :: CalcParser Double
calcDouble = do
integ <- many1 digit
char '.'
decim <- many1 digit
pure $ read $ integ <> "." <> decim
numberFact :: CalcParser Integer
numberFact = read <$> many1 digit
applyMany :: a -> [a -> a] -> a
applyMany x [] = x
applyMany x (h:t) = applyMany (h x) t
div_ :: CalcParser (Double -> Double -> Double)
div_= do
char '/'
return (/)
star :: CalcParser (Double -> Double -> Double)
star = do
char '*'
return (*)
plus :: CalcParser (Double -> Double -> Double)
plus = do
char '+'
return (+)
minus :: CalcParser (Double -> Double -> Double)
minus = do
char '-'
return (-)
multiplication :: CalcParser Double
multiplication = do
spaces
lhv <- atom <|> fact' <|> negation'
spaces
t <- many tail
return $ applyMany lhv t
where tail =
do
f <- star <|> div_
spaces
rhv <- atom <|> fact' <|> negation'
spaces
return (`f` rhv)
atom :: CalcParser Double
atom = number <|> do
char '('
res <- addition
char ')'
return res
fact' :: CalcParser Double
fact' = do
spaces
rhv <- numberFact
char '!'
return $ factorial rhv
factorial :: Integer -> Double
factorial n
| n < 0 = error "No factorial exists for negative inputs"
| n == 0 || n == 1 = 1
| otherwise = acc n 1
where
acc 0 a = a
acc b a = acc (b-1) (fromIntegral b * a)
negation' :: CalcParser Double
negation' = do
spaces
char '~'
rhv <- atom
spaces
return $ negate rhv
addition :: CalcParser Double
addition = do
spaces
lhv <- multiplication <|> fact' <|> negation'
spaces
t <- many tail
return $ applyMany lhv t
where tail =
do
f <- plus <|> minus
spaces
rhv <- multiplication <|> fact' <|> negation'
spaces
return (`f` rhv)
It's a kind of a simple calculator parser that provides with addition / subtraction / factorial processing. I gonna let input any factorial-linked components of string chunk so that each of them to consist of some number followed by '!' character (as normally in most real world calculators). However, when running a parser test as:
parseTest addition "3!"
it fails to compute factorial itself, but returns 3.0 (output to be presented as a floating point number, which is why I use CalcParser Double in this program). A bizarr fact is that whenever I set '!' in prior to a number:
fact' :: CalcParser Double
fact' = do
spaces
char '!'
rhv <- numberFact
return $ factorial rhv
the result of:
parseTest addition "!3"
meets my expectations, i.e. it equals 6.0. Following that, I assume there is some setback in the first version of stuff which fails to run factorial computing whenever '!' is a next to some number-like input elements. That's just a case I'm looking for any help to solve here. So, what's wrong with right-side '!' and how would you handle the problem decribed?
The general problem is that Parsec stops trying alternatives when it finds the first one that works. If you write:
parseTest (atom <|> fact') "3!"
this will result in 3.0. This is because the atom parser successfully parses the initial part of the string "3", so Parsec doesn't even try the fact' parser. It leaves the rest of the string "!" for another, later parser to handle.
In your parser code above, if you try the parse:
parseTest addition "3!"
the addition parser starts by trying the multiplication parser. The multiplication parser has the line:
lhv <- atom <|> fact' <|> negation'
so it starts by trying the atom parser. This parser works fine and returns 3.0, so it never bothers trying fact' or negation.
To fix your parser, you need to make sure that you don't successfuly parse an atom in an alternative before trying fact'. You could start by switching the order:
> parseTest (fact' <|> atom) "3!"
6.0
This works fine to parse "3!" (and gives 6.0), but it fails if you try to parse something else:
> parseTest (fact' <|> atom) "3"
parse error at (line 1, column 2):
unexpected end of input
expecting "!"
This is because, for efficiency reasons, Parsec doesn't automatically "backtrack". If you try to parse something and it "consumes input" before failing, it fails completely instead of trying more alternatives. Here, fact' starts by calling numberFact which successfully "consumes" the "3", and then it tries char '!', which fails. So, fact "fails after consuming input" which leads to an immediate parse error.
You can override this behavior using the try function:
> parseTest (try fact' <|> atom) "3"
3.0
> parseTest (try fact' <|> atom) "3!"
6.0
Here try fact' applies the parser fact' but treats "fail after consuming input" as if it was "fail after consuming no input", so additional alternatives can be checked.
If you applied this change to both places in your multiplication parser:
multiplication :: CalcParser Double
multiplication = do
spaces
lhv <- try fact' <|> atom <|> negation'
spaces
t <- many tail
return $ applyMany lhv t
where tail =
do
f <- star <|> div_
spaces
rhv <- try fact' <|> atom <|> negation'
spaces
return (`f` rhv)
and made similar changes to your addition parser:
addition :: CalcParser Double
addition = do
spaces
lhv <- try fact' <|> multiplication <|> negation'
spaces
t <- many tail
return $ applyMany lhv t
where tail =
do
f <- plus <|> minus
spaces
rhv <- try fact' <|> multiplication <|> negation'
spaces
return (`f` rhv)
then it would work better:
> parseTest addition "3!"
6.0
> parseTest addition "3"
3.0
> parseTest addition "3+2*6!"
1443.0
It's also a good idea to add an eof parser to your test to make sure that you don't have any left over trailing garbage that wasn't parsed:
> parseTest addition "3 Hey, I could write anything here"
3.0
> parseTest (addition <* eof) "3 but adding eof will generate an error"
parse error at (line 1, column 4):
unexpected 'b'
expecting space, "*", "/", white space, "+", "-" or end of input

How to express parsing logic in Parsec ParserT monad

I was working on "Write Yourself a Scheme in 48 hours" to learn Haskell and I've run into a problem I don't really understand. It's for question 2 from the exercises at the bottom of this section.
The task is to rewrite
import Text.ParserCombinators.Parsec
parseString :: Parser LispVal
parseString = do
char '"'
x <- many (noneOf "\"")
char '"'
return $ String x
such that quotation marks which are properly escaped (e.g. in "This sentence \" is nonsense") get accepted by the parser.
In an imperative language I might write something like this (roughly pythonic pseudocode):
def parseString(input):
if input[0] != "\"" or input[len(input)-1] != "\"":
return error
input = input[1:len(input) - 1] # slice off quotation marks
output = "" # This is the 'zero' that accumulates over the following loop
# If there is a '"' in our string we want to make sure the previous char
# was '\'
for n in range(len(input)):
if input[n] == "\"":
try:
if input[n - 1] != "\\":
return error
catch IndexOutOfBoundsError:
return error
output += input[n]
return output
I've been looking at the docs for Parsec and I just can't figure out how to work this as a monadic expression.
I got to this:
parseString :: Parser LispVal
parseString = do
char '"'
regular <- try $ many (noneOf "\"\\")
quote <- string "\\\""
char '"'
return $ String $ regular ++ quote
But this only works for one quotation mark and it has to be at the very end of the string--I can't think of a functional expression that does the work that my loops and if-statements do in the imperative pseudocode.
I appreciate you taking your time to read this and give me advice.
Try something like this:
dq :: Char
dq = '"'
parseString :: Parser Val
parseString = do
_ <- char dq
x <- many ((char '\\' >> escapes) <|> noneOf [dq])
_ <- char dq
return $ String x
where
escapes = dq <$ char dq
<|> '\n' <$ char 'n'
<|> '\r' <$ char 'r'
<|> '\t' <$ char 't'
<|> '\\' <$ char '\\'
The solution is to define a string literal as a starting quote + many valid characters + an ending quote where a "valid character" is either a an escape sequence or non-quote.
So there is a one line change to parseString:
parseString = do char '"'
x <- many validChar
char '"'
return $ String x
and we add the definitions:
validChar = try escapeSequence <|> satisfy ( /= '"' )
escapeSequence = do { char '\\'; anyChar }
escapeSequence may be refined to allow a limited set of escape sequences.

How can I parse a float with a comma in place of the decimal point?

I want to parse Float values from a file where they are stored using comma as the decimal separator. Thus i need a function myParse :: String -> Float such that, for instance, myParse "23,46" == 23.46.
I have some ideas about how to do this, but they all seem overcomplicated, for example:
replace , with a . in the string and use read; or
follow this FP Complete blogpost (entitled Parsing Floats With Parsec), and challenge the curse of the monomorphism restriction.
Is there a simpler way, or do I really need to use a parsing library? In the second case, could you please paste some suggestions in order to get me started? The monomorphism restriction scares me, and I believe that there has to be a way to do this without using language extensions.
Replacing , by . and then call read is straightforward enough; you just need to remember to use your own specialized function instead of plain old read:
readFloatWithComma :: String -> Float
readFloatWithComma = read . sanitize
where
sanitize = map (\c -> if c == ',' then '.' else c)
In GHCi:
λ> readFloatWithComma "23,46"
23.46
Regarding the parsec approach, despite what the article you link to suggest, the monomorphism restriction needs not be a worry, as long as you have type signatures for all your top-level bindings. In particular, the following code doesn't need any language extensions to compile properly (at least, in GHC 7.10.1):
import Text.Parsec
import Text.Parsec.String ( Parser )
import Control.Applicative hiding ( (<|>) )
infixr 5 <++>
(<++>) :: Applicative f => f [a] -> f [a] -> f [a]
a <++> b = (++) <$> a <*> b
infixr 5 <:>
(<:>) :: Applicative f => f a -> f [a] -> f [a]
a <:> b = (:) <$> a <*> b
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
integer :: Parser String
integer = plus <|> minus <|> number
float :: Parser Float
float = fmap rd $ integer <++> decimal <++> exponent
where rd = read :: String -> Float
decimal = option "" $ ('.' <$ char ',') <:> number
exponent = option "" $ oneOf "eE" <:> integer
In GHCi:
λ> parseTest float "23,46"
23.46

Why doesn't Parsec allow white spaces before operators in the table for buildExpressionParser

In the code below I can correctly parse white spaces after each of the tokens using Parsec:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
whitespace;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
whitespace;
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
However I get a parse error when trying to only parse white spaces before, and after the operators:
whitespace = skipMany (space <?> "")
number :: Parser Integer
number = result <?> "number"
where
result = do {
ds <- many1 digit;
return (read ds)
}
table = result
where
result = [
[Infix (genParser '*' (*)) AssocLeft,
Infix (genParser '/' div) AssocLeft],
[Infix (genParser '+' (+)) AssocLeft,
Infix (genParser '-' (-)) AssocLeft]]
genParser s f = whitespace >> char s >> whitespace >> return f
factor = parenExpr <|> number <?> "parens or number"
where
parenExpr = do {
char '(';
x <- expr;
char ')';
return x
}
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
The parse error is:
$ ./parsec_example < <(echo "2 * 2 * 3")
"(stdin)" (line 2, column 1):
unexpected end of input
expecting "*"
Why does this happen? Is there some other way to parse white space around just the operators?
When I test your code, 2 * 2 * 3 parses correctly, but 2 + 2 does not. Parsing fails because the parser for * consumes some input and backtracking isn't enabled at that position, so other parsers cannot be tried.
An expression parser created by buildExpressionParser tries to parse each operator in turn until one succeeds. When parsing 2 + 2, the following occurs:
The first 2 is matched by number. The rest of the input is  + 2 (note the space at the beginning).
The parser genParser '*' (*) is applied to the input. It consumes the space, but does not match the + character.
The other infix operator parsers automatically fail because some input was consumed by genParser '*' (*).
You can fix this by wrapping the critical part of the parser in try. This saves the input until after char s succeeds. If char s fails, then buildExpressionParser can backtrack and try another infix operator.
genParser s f = try (whitespace >> char s) >> whitespace >> return f
The drawback of this parser is that, because it backtracks to before the leading whitespace before an infix operator, it repeatedly scans whitespace. It is usually better to parse whitespace after a successful match, like the OP's first parser example.

How do you use parsec in a greedy fashion?

In my work I come across a lot of gnarly sql, and I had the bright idea of writing a program to parse the sql and print it out neatly. I made most of it pretty quickly, but I ran into a problem that I don't know how to solve.
So let's pretend the sql is "select foo from bar where 1". My thought was that there is always a keyword followed by data for it, so all I have to do is parse a keyword, and then capture all gibberish before the next keyword and store that for later cleanup, if it is worthwhile. Here's the code:
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Data.Text (strip)
newtype Statement = Statement [Atom]
data Atom = Branch String [Atom] | Leaf String deriving Show
trim str = reverse $ trim' (reverse $ trim' str)
where
trim' (' ':xs) = trim' xs
trim' str = str
printStatement atoms = mapM_ printAtom atoms
printAtom atom = loop 0 atom
where
loop depth (Leaf str) = putStrLn $ (replicate depth ' ') ++ str
loop depth (Branch str atoms) = do
putStrLn $ (replicate depth ' ') ++ str
mapM_ (loop (depth + 2)) atoms
keywords :: [String]
keywords = [
"select",
"update",
"delete",
"from",
"where"]
keywordparser :: Parsec String u String
keywordparser = try ((choice $ map string keywords) <?> "keywordparser")
stuffparser :: Parsec String u String
stuffparser = manyTill anyChar (eof <|> (lookAhead keywordparser >> return ()))
statementparser = do
key <- keywordparser
stuff <- stuffparser
return $ Branch key [Leaf (trim stuff)]
<?> "statementparser"
tp = parse (many statementparser) ""
The key here is the stuffparser. That is the stuff in between the keywords that could be anything from column lists to where criteria. This function catches all characters leading up to a keyword. But it needs something else before it is finished. What if there is a subselect? "select id,(select product from products) from bar". Well in that case if it hits that keyword, it screws everything up, parses it wrong and screws up my indenting. Also where clauses can have parenthesis as well.
So I need to change that anyChar into another combinator that slurps up characters one at a time but also tries to look for parenthesis, and if it finds them, traverse and capture all that, but also if there are more parenthesis, do that until we have fully closed the parenthesis, then concatenate it all and return it. Here's what I've tried, but I can't quite get it to work.
stuffparser :: Parsec String u String
stuffparser = fmap concat $ manyTill somechars (eof <|> (lookAhead keywordparser >> return ()))
where
somechars = parens <|> fmap (\c -> [c]) anyChar
parens= between (char '(') (char ')') somechars
This will error like so:
> tp "select asdf(qwerty) from foo where 1"
Left (line 1, column 14):
unexpected "w"
expecting ")"
But I can't think of any way to rewrite this so that it works. I've tried to use manyTill on the parenthesis part, but I end up having trouble getting it to typecheck when I have both string producing parens and single chars as alternatives. Does anyone have any suggestions on how to go about this?
Yeah, between might not work for what you're looking for. Of course, for your use case, I'd follow hammar's suggestion and grab an off-the-shelf SQL parser. (personal opinion: or, try not to use SQL unless you really have to; the idea to use strings for database queries was imho a historical mistake).
Note: I add an operator called <++> which will concatenate the results of two parsers, whether they are strings or characters. (code at bottom.)
First, for the task of parsing parenthesis: the top level will parse some stuff between the relevant characters, which is exactly what the code says,
parseParen = char '(' <++> inner <++> char ')'
Then, the inner function should parse anything else: non-parens, possibly including another set of parenthesis, and non-paren junk that follows.
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
I'll make the assumption that for the rest of the solution, what you want to do is analgous to splitting things up by top-level SQL keywords. (i.e. ignoring those in parenthesis). Namely, we'll have a parser that will behave like so,
Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
[(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]
Suppose we have a parseKw parser that will get the likes of select, etc. After we consume a keyword, we need to read until the next [top-level] keyword. The last trick to my solution is using the lookAhead combinator to determine whether the next word is a keyword, and put it back if so. If it's not, then we consume a parenthesis or other character, and then recurse on the rest.
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
My entire solution is as follows
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
edit - version with quote support
you can do the same thing as with the parens to support quotes,
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
parseKw =
(Select <$ string "select") <|>
(Update <$ string "update") <|>
(Delete <$ string "delete") <|>
(From <$ string "from") <|>
(Where <$ string "where") <?>
"keyword (select, update, delete, from, where)"
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
(("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))
parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
parseQuote = char '\'' <++> inner <++> char '\'' where
inner = many (noneOf "'\\") <++>
option "" (char '\\' <++> anyChar <++> inner)
parseParen = char '(' <++> inner <++> char ')' where
inner = many (noneOf "'()") <++>
(parseQuote <++> inner <|> option "" (parseParen <++> inner))
I tried it with parseTest parseSqlToplevel "select ('a(sdf'())b". cheers

Resources