Bottleneck in math parser Haskell - parsing

I got this code below from the wiki books page here. It parses math expressions, and it works very well for the code I'm working on. Although there is one problem, when I start to add layers of brackets to my expression the program slows down dramatically, crashing my computer at some point. It has something to do with the number of operators I have it check for, the more operators I have the less brackets I can parse. Is there anyway to get around or fix this bottleneck?
Any help is much appreciated.
import Text.ParserCombinators.ReadP
-- slower
operators = [("Equality",'='),("Sum",'+'), ("Product",'*'), ("Division",'/'), ("Power",'^')]
-- faster
-- ~ operators = [("Sum",'+'), ("Product",'*'), ("Power",'^')]
skipWhitespace = do
many (choice (map char [' ','\n']))
return ()
brackets p = do
skipWhitespace
char '('
r <- p
skipWhitespace
char ')'
return r
data Tree op = Apply (Tree op) (Tree op) | Branch op (Tree op) (Tree op) | Leaf String deriving Show
leaf = chainl1 (brackets tree
+++ do
skipWhitespace
s <- many1 (choice (map char "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-[]" ))
return (Leaf s))
(return Apply)
tree = foldr (\(op,name) p ->
let this = p +++ do
a <- p +++ brackets tree
skipWhitespace
char name
b <- this
return (Branch op a b)
in this)
(leaf +++ brackets tree)
operators
readA str = fst $ last $ readP_to_S tree str
main = do loop
loop = do
-- ~ try this
-- ~ (a+b+(c*d))
str <- getLine
print $ last $ readP_to_S tree str
loop

This is a classic problem in backtracking (or parallel parsing, they are basically the same thing).... Backtracking grows (at worst) exponentially with the size of the input, so the time to parse something can suddenly explode. In practice backtracking works OK in language parsing for most input, but explodes with recursive infix operator notation. You can see why by considering how many possibile ways this could be parsed (using made up & and % operators):
a & b % c & d
could be parsed as
a & (b % (c & d))
a & ((b % c) & d)
(a & (b % c)) & d
((a & b) % c) & d
This grows like 2^(n-1). The solution to this is to add some operator precidence information earlier in the parse, and throw away all but the sensible cases.... You will need an extra stack to hold pending operators, but you can always go through infix operator expressions in O(1).
LR parsers like yacc do this for you.... With a parser combinator you need to do it by hand. In parsec, there is a Expr package with a buildExpressionParser function that builds this for you.

Related

How do try and <|> functions from parsers lib work

(I use trifecta parser lib). I'm trying to make a parser that parses integers into Right and literal sequences (alphabet, numeral symbols and "-" are allowed) into Left:
*Lib> parseString myParser mempty "123 qwe 123qwe 123-qwe-"
Success [Right 123,Left "qwe",Left "123qwe",Left "123-qwe-"]
That is what I invented:
myParser :: Parser [Either String Integer]
myParser = sepBy1 (try (Right . read <$> (some digit <* notFollowedBy (choice [letter, char '-'])))
<|> Left <$> some (choice [alphaNum, char '-']))
(char ' ')
My problem is that I don't understand why try is needed there (and in any other similar situations). When try is not used, an error appears:
*Lib> parseString myParser mempty "123 qwe 123qwe 123-qwe-"
Failure (ErrInfo {_errDoc = (interactive):1:12: error: expected: digit
1 | 123 qwe 123qwe 123-qwe-<EOF>
| ^ , _errDeltas = [Columns 11 11]})
So try puts the parsing cursor back to where we started on failure. Imagine try isn't used:
123qwe
^ failed there, the cursor position remains there
On the other hand, <|> is like "either". It should run the second parser Left <$> some (choice [alphaNum, char '-'])) (when the first parser failed) and consume just "qwe".
Somewhere I'm wrong.
The second parser would indeed consume the "qwe" part if only it was given a chance to run. But it isn't given such chance.
Look at the definition of (<|>) for Parser:
Parser m <|> Parser n = Parser $ \ eo ee co ce d bs ->
m eo (\e -> n (\a e' -> eo a (e <> e')) (\e' -> ee (e <> e')) co ce d bs) co ce d bs
Hmm... Maybe not such a good idea to look at that. But let's push through nevertheless. To make sense of all those eo, ee, etc., let's look at their explanations on the Parser definition:
The first four arguments are behavior continuations:
epsilon success: the parser has consumed no input and has a result as well as a possible Err; the position and chunk are unchanged (see pure)
epsilon failure: the parser has consumed no input and is failing with the given Err; the position and chunk are unchanged (see empty)
committed success: the parser has consumed input and is yielding the result, set of expected strings that would have permitted this parse to continue, new position, and residual chunk to the continuation.
committed failure: the parser has consumed input and is failing with a given ErrInfo (user-facing error message)
In your case we clearly have "committed failure" - i.e. the Right parser has consumed some input and failed. So in this case it's going to call the fourth continuation - denoted ce in the definition of (<|>).
And now look at the body of the definition: the fourth continuation is passed to parser m unchanged:
m eo (\e -> n (\a e' -> eo a (e <> e')) (\e' -> ee (e <> e')) co ce d bs) co ce d bs
^
|
here it is
This means that the parser returned from (<|>) will call the fourth continuation in all cases in which parser m calls it. Which means that it will fail with "committed failure" in all cases in which the parser m fails with "committed failure". Which is exactly what you observe.

Please explain the behavior of this Parsec permutation parser

Why does this Parsec permutation parser not parse b?
p :: Parser (String, String)
p = permute (pair
<$?> ("", pa)
<|?> ("", pb))
where pair a b = (a, b)
pa :: Parser String
pa = do
char 'x'
many1 (char 'a')
pb :: Parser String
pb = do
many1 (char 'b')
λ> parseTest p "xaabb"
("aa","bb") -- expected result, good
λ> parseTest p "aabb"
("","") -- why "" for b?
Parser pa is configured as optional via <$?> so I don't understand why its failing has impacted the parsing of b. I can change it to optional (char 'x') to get the expected behavior, but I don't understand why.
pa :: Parser String
pa = do
optional (char 'x')
many1 (char 'a')
pb :: Parser String
pb = do
optional (char 'x')
many1 (char 'b')
λ> parseTest p "xaaxbb"
parse error at (line 1, column 2):
unexpected "a"
expecting "b"
λ> parseTest p "xbbxaa"
("aa","bb")
How can both input orderings be supported when we have identical shared prefix "x"?
I also don't understand the impact that consumption of the optional "x" is having on the parse behavior:
pb :: Parser String
pb = do
try px -- with this try x remains unconsumed and "aa" gets parsed
-- without this try x is consumed, but "aa" isn't parsed even though "x" is optional anyway
many1 (char 'b')
px :: Parser Char
px = do
optional (char 'x')
char 'x' <?> "second x"
λ> parseTest p "xaaxbb" -- without try on px
parse error at (line 1, column 2):
unexpected "a"
expecting second x
λ> parseTest p "xaaxbb" -- with try on px
("aa","")
Why parseTest p "aabb" gives ("","")
The permutation parser tries to strip off the front of the given string prefixes that can be parsed by its constituent parsers (pa and pb in this case). Here, it will have tried to apply both pa and pb to "aabb" and failed in both cases - it never even gets around to trying to parse "bb".
Why can't both pa and pb start with optional (char 'x')
Looking at permute, you'll see it uses choice, which in turn relies on (<|>). As the documentation of (<|>) says,
This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried. This combinator is defined equal to the mplus member of the MonadPlus class and the (<|>) member of Alternative.
The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.
So when you do something like parseTest p "xbb", pa doesn't fail immediately (it consumes and 'x') and then the whole thing fails because it cannot backtrack.
How to make shared prefixes work?
As Daniel has suggested, it is best to factor out your grammar. Alternately, you can use try:
The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs
Based on what we talked about before for (<|>), you ought then to put try in front of both of optional (char 'x').
Why does this Parsec permutation parser not parse b?
Because 'a' is not a valid first character for either parser pa or parser pb.
How can both input orderings be supported when we have identical shared prefix "x"?
Shared prefixes must be factored out of your grammar; or backtracking points inserted (using try) at the cost of performance.

Parse string with lex in Haskell

I'm following Gentle introduction to Haskell tutorial and the code presented there seems to be broken. I need to understand whether it is so, or my seeing of the concept is wrong.
I am implementing parser for custom type:
data Tree a = Leaf a | Branch (Tree a) (Tree a)
printing function for convenience
showsTree :: Show a => Tree a -> String -> String
showsTree (Leaf x) = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)
instance Show a => Show (Tree a) where
showsPrec _ x = showsTree x
this parser is fine but breaks when there are spaces
readsTree :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) = [(Branch l r, u) | (l, '|':t) <- readsTree s,
(r, '>':u) <- readsTree t ]
readsTree s = [(Leaf x, t) | (x,t) <- reads s]
this one is said to be a better solution, but it does not work without spaces
readsTree_lex :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s = [(Branch l r, x) | ("<", t) <- lex s,
(l, u) <- readsTree_lex t,
("|", v) <- lex u,
(r, w) <- readsTree_lex v,
(">", x) <- lex w ]
++
[(Leaf x, t) | (x, t) <- reads s ]
next I pick one of parsers to use with read
instance Read a => Read (Tree a) where
readsPrec _ s = readsTree s
then I load it in ghci using Leksah debug mode (this is unrelevant, I guess), and try to parse two strings:
read "<1|<2|3>>" :: Tree Int -- succeeds with readsTree
read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex
when lex encounters |<2... part of the former string, it splits onto ("|<", _). That does not match ("|", v) <- lex u part of parser and fails to complete parsing.
There are two questions arising:
how do I define parser that really ignores spaces, not requires them?
how can I define rules for splitting encountered literals with lex
speaking of second question -- it is asked more of curiousity as defining my own lexer seems to be more correct than defining rules of existing one.
lex splits into Haskell lexemes, skipping whitespace.
This means that since Haskell permits |< as a lexeme, lex will not split it into two lexemes, since that's not how it parses in Haskell.
You can only use lex in your parser if you're using the same (or similar) syntactic rules to Haskell.
If you want to ignore all whitespace (as opposed to making any whitespace equivalent to one space), it's much simpler and more efficient to first run filter (not.isSpace).
The answer to this seems to be a small gap between text of Gentle introduction to Haskell and its code samples, plus an error in sample code.
there should also be one more lexer, but there is no working example (satisfying my need) in codebase, so I written one. Please point out any flaw in it:
lexAll :: ReadS String
lexAll s = case lex s of
[("",_)] -> [] -- nothing to parse.
[(c, r)] -> if length c == 1 then [(c, r)] -- we will try to match
else [(c, r), ([head s], tail s)]-- not only as it was
any_else -> any_else -- parsed but also splitted
author sais:
Finally, the complete reader. This is not sensitive to white space as
were the previous versions. When you derive the Show class for a data
type the reader generated automatically is similar to this in style.
but lexAll should be used instead of lex (which seems to be said error):
readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
(l, u) <- readsTree' t,
("|", v) <- lexAll u,
(r, w) <- readsTree' v,
(">", x) <- lexAll w ]
++
[(Leaf x, t) | (x, t) <- reads s]

Invertible State monad (and parsers)

Good day, ladies and gentlemen!
I'm constantly writing parsers and codecs. Implementing both parsers and printers seems to be massive code duplication. I wonder whether it is possible to invert a stateful computation, given it is isomorphic by nature.
It is possible to invert pure function composition (Control.Lens.Iso did that by defining a composition operator over isomorphisms). As it can be observed,
Iso bc cb . Iso ab ba = Iso (bc . ab) (ba . cb) -- from Lenses talk
invert (f . g) = (invert g) . (invert f) -- pseudo-code
In other words, to invert a function composition one should compose inverted functions in the opposite order. So, given all primitive isomorphic pairs are defined, one can compose them to get more complicated pairs with no code duplication. Here is an example of pure bidirectional computation (Control.Lens is used, the explanatory video can help you to get the general idea of Lenses, Folds and Traversals):
import Control.Lens
tick :: Num a => Iso' a a
tick = iso (+1) (subtract 1) -- define an isomorphic pair
double :: Num a => Iso' a a
double = iso (+2) (subtract 2) -- and another one
threeTick :: Num a => Iso' a a
-- These are composed via simple function composition!
threeTick = double . tick
main :: IO ()
main = do
print $ (4 :: Int)^.tick -- => 5
print $ (4 :: Int)^.from tick -- => 3
print $ (4 :: Int)^.threeTick -- => 7, Composable
print $ (4 :: Int)^.from threeTick -- => 1, YEAH
As you can see, I didn't need to supply the inverted version of threeTick; it is obtained by backward composition automatically!
Now, let's consider a simple parser.
data FOO = FOO Int Int deriving Show
parseFoo :: Parser FOO
parseFoo = FOO <$> decimal <* char ' '
<*> decimal
parseFoo' :: Parser FOO
parseFoo' = do
first <- decimal
void $ char ' '
second <- decimal
return $ FOO first second
printFoo :: FOO -> BS.ByteString
printFoo (FOO a b) = BS.pack(show a) <>
BS.pack(" ") <>
BS.pack(show b)
main :: IO ()
main = do
print $ parseOnly parseFoo "10 11" -- => Right (FOO 10 11)
print $ parseOnly parseFoo' "10 11" -- => Right (FOO 10 11)
print . printFoo $ FOO 10 11 -- => "10 11"
print . parseOnly parseFoo . printFoo $ FOO 10 11 -- id
You can see that both versions of parseFoo are fairly declarative (thanks to parser combinators). Note the similarity between parseFoo and printFoo. Can I define isomorphisms over primitive parsers (decimal and char) and then just derive the printer (printFoo :: FOO -> String) automatically? Ideally, parser combinators will work as well.
I tried to redefine a monadic >>= operator in order to provide inverted semantics, but I've failed to do so. I feel that one could define inverted Kleisli composition operator (monadic function composition) similarly to composition inversion, but can one use it with ordinary monads?
f :: a -> m b, inverse f :: b -> m a
g :: b -> m c, inverse g :: c -> m b
inverse (f >=> g) = (inverse f) <=< (inverse g)
Why inverse f is of type b -> m a and not m b -> a? The answer is: monadic side effect is an attribute of an arrow, not that of a data type b. The state monad dualization is further discussed in the great Expert to Expert video.
If the solution does exist, could you please supply a working example of printFoo derivation? By the way, here is an interesting paper that could help us find the solution.
You may be interested in digging in further into the lens package for the concept of a Prism.
A Prism can be used as a 'smart constructor' to build something (e.g. a pretty printed string) unconditionally, and match on it (e.g. parse).
You'd have to ignore the laws or treat the laws as holding only up to a quotient though, as the string you get out of pretty printing is very likely not exactly the string you parsed.

Parsing in Haskell for a simple interpreter

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.
Here is the piece of code which I am having problems with
data IntExp
= IVar Var
| ICon Int
| Add IntExp IntExp
deriving (Read, Show)
whitespace = many1 (char ' ')
parseICon :: Parser IntExp
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
parseIVar :: Parser IntExp
parseIVar =
do x <- many (letter)
prime <- string "'" <|> string ""
return (IVar (x ++ prime))
parseIntExp :: Parser IntExp
parseIntExp =
do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
return x
parseAdd :: Parser IntExp
parseAdd =
do x <- parseIntExp
whitespace
string "+"
whitespace
y <- parseIntExp
return (Add x y)
runP :: Show a => Parser a -> String -> IO ()
runP p input
= case parse p "" input of
Left err ->
do putStr "parse error at "
print err
Right x -> print x
The language is slightly more complex, but this is enough to show my problem.
So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully
runP parseAdd "5 + 5"
which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg
runP parseAdd "n + m"
This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.
So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.
I hope I've given enough info about my problem and I was clear enough.
Thanks for any help you can give me!
Your problem is actually in parseICon:
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.
And while I'm at it, since you're new to Haskell, here's some unsolicited advice:
Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.
You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.
You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.
It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.
It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.
Also, doing this sort of thing is redundant and unnecessary:
parseICon :: Parser IntExp
parseICon =
do x <- many digit
return (ICon (read x))
When all you're doing is applying a regular function to the result of a parser, you can just use fmap:
parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)
They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.
Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:
whitespace = many1 $ char ' '
parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit
parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName
parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime
parsePrime :: Parser String
parsePrime = option "" $ string "'"
parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd
parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()
parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
I'm also new to Haskell, just wondering:
will parseIntExp ever make it to parseAdd?
It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.
e.g. runP parseIntExp "3 + m"
would try parseICon, and succeed, giving
(ICon 3) instead of (Add (ICon 3) (IVar m))
Sorry if I'm being stupid here, I'm just unsure.

Resources