Using ParserResult - f#

The example code below appears to work nicely:
open FParsec
let capitalized : Parser<unit,unit> =(asciiUpper >>. many asciiLower >>. eof)
let inverted : Parser<unit,unit> =(asciiLower >>. many asciiUpper >>. eof)
let capsOrInvert =choice [capitalized;inverted]
You can then do:
run capsOrInvert "Dog";;
run capsOrInvert "dOG";;
and get a success or:
run capsOrInvert "dog";;
and get a failure.
Now that I have a ParserResult, how do I do things with it? For example, print the string backwards?

There are several notable issues with your code.
First off, as noticed in #scrwtp's answer, your parser returns unit. Here's why: operator (>>.) returns only the result returned by the right inner parser. On the other hand, (.>>) would return the result of a left parser, while (.>>.) would return a tuple of both left and right ones.
So, parser1 >>. parser2 >>. eof is essentially (parser1 >>. parser2) >>. eof.
The code in parens completely ignores the result of parser1, and the second (>>.) then ignores the entire result of the parser in parens. Finally, eof returns unit, and this value is being returned.
You may need some meaningful data returned instead, e.g. the parsed string. The easiest way is:
let capitalized = (asciiUpper .>>. many asciiLower .>> eof)
Mind the operators.
The code for inverted can be done in a similar manner.
This parser would be of type Parser<(char * char list), unit>, a tuple of first character and all the remaining ones, so you may need to merge them back. There are several ways to do that, here's one:
let mymerge (c1: char, cs: char list) = c1 :: cs // a simple cons
let pCapitalized = capitalized >>= mymerge
The beauty of this code is that your mymerge is a normal function, working with normal char's, it knows nothing about parsers or so. It just works with the data, and (>>=) operator does the rest.
Note, pCapitalized is also a parser, but it returns a single char list.
Nothing stops you from applying further transitions. As you mentioned printing the string backwards:
let pCapitalizedAndReversed =
capitalized
>>= mymerge
>>= List.rev
I have written the code in this way for purpose. In different lines you see a gradual transition of your domain data, still within the paradigm of Parser. This is an important consideration, because any subsequent transition may "decide" that the data is bad for some reason and raise a parsing exception, for example. Or, alternatively, it may be merged with other parser.
As soon as your domain data (a parsed-out word) is complete, you extract the result as mentioned in another answer.
A minor note. choice is superfluous for only two parsers. Use (<|>) instead. From experience, careful choosing parser combinators is important because a wrong choice deep inside your core parser logic can easily make your parsers dramatically slow.
See FParsec Primitives for further details.

ParserResult is a discriminated union. You simply match the Success and Failure cases.
let r = run capsOrInvert "Dog"
match r with
| Success(result, _, _) -> printfn "Success: %A" result
| Failure(errorMsg, _, _) -> printfn "Failure: %s" errorMsg
But this is probably not what you find tricky about your situation.
The thing about your Parser<unit, unit> type is that the parsed value is of type unit (the first type argument to Parser). What this means is that this parser doesn't really produce any sensible output for you to use - it can only tell you whether it can parse a string (in which case you get back a Success ((), _, _) - carrying the single value of type unit) or not.
What do you expect to get out of this parser?
Edit: This sounds close to what you want, or at least you should be able to pick up some pointers from it. capitalized accepts capitalized strings, inverted accepts capitalized strings that have been reversed and reverses them as part of the parser logic.
let reverse (s: string) =
System.String(Array.rev (Array.ofSeq s))
let capitalized : Parser<string,unit> =
(asciiUpper .>>. manyChars asciiLower)
|>> fun (upper, lower) -> string upper + lower
let inverted : Parser<string,unit> =
(manyChars asciiLower .>>. asciiUpper)
|>> fun (lower, upper) -> reverse (lower + string upper)
let capsOrInvert = choice [capitalized;inverted]
run capsOrInvert "Dog"
run capsOrInvert "doG"
run capsOrInvert "dog"

Related

Why does try not trigger backtracking in this example

I am trying to wrap my head around writing parser using parsec in Haskell, in particular how backtracking works.
Take the following simple parser:
import Text.Parsec
type Parser = Parsec String () String
parseConst :: Parser
parseConst = do {
x <- many digit;
return $ read x
}
parseAdd :: Parser
parseAdd = do {
l <- parseExp;
char '+';
r <- parseExp;
return $ l <> "+" <> r
}
parseExp :: Parser
parseExp = try parseConst <|> parseAdd
pp :: Parser
pp = parseExp <* eof
test = parse pp "" "1+1"
test has value
Left (line 1, column 2):
unexpected '+'
expecting digit or end of input
In my mind this should succeed since I used the try combinator on parseConst in the definition of parseExp.
What am I missing? I am also interrested in pointers for how to debug this in my own, I tried using parserTraced which just allowed me to conclude that it indeed wasn't backtracking.
PS.
I know this is an awful way to write an expression parser, but I'd like to understand why it doesn't work.
There are a lot of problems here.
First, parseConst can never work right. The type says it must produce a String, so read :: String -> String. That particular Read instance requires the input be a quoted string, so being passed 0 or more digit characters to read is always going to result in a call to error if you try to evaluate the value it produces.
Second, parseConst can succeed on matching zero characters. I think you probably wanted some instead of many. That will make it actually fail if it encounters input that doesn't start with a digit.
Third, (<|>) doesn't do what you think. You might think that (a <* c) <|> (b <* c) is interchangeable with (a <|> b) <* c, but it isn't. There is no way to throw try in and make it the same, either. The problem is that (<|>) commits to whichever branch succeed, if one does. In (a <|> b) <* c, if a matches, there's no way later to backtrack and try b there. Doesn't matter how you lob try around, it can't undo the fact that (<|>) committed to a. In contrast, (a <* c) <|> (b <* c) doesn't commit until both a and c or b and c match the input.
This is the situation you're encountering. You have (try parseConst <|> parseAdd) <* eof, after a bit of inlining. Since parseConst will always succeed (see the second issue), parseAdd will never get tried, even if the eof fails. So after parseConst consumes zero or more leading digits, the parse will fail unless that's the end of the input. Working around this essentially requires carefully planning your grammar such that any use of (<|>) is safe to commit locally. That is, the contents of each branch must not overlap in a way that is disambiguated only by later portions of the grammar.
Note that this unpleasant behavior with (<|>) is how the parsec family of libraries work, but not how all parser libraries in Haskell work. Other libraries work without the left bias or commit behavior the parsec family have chosen.

Grouping lines with Parsec

I have a line-based text format I want to parse with Parsec†. A line either starts with a pound sign and specifies a key value pair separated by a colon or is a URL that is described by the previous tags.
Here's a short example:
#foo:bar
#faz:baz
https://example.com
#foo:beep
https://example.net
For simplicity's sake, I'm going to store everything as String. A Tag is a type Tag = (String, String), for example ("foo", "bar"). Ultimately, I'd like to group these as ([Tag], URL).
However, I struggle figuring out how to parse either [one or more tags] or [one URL].
My current approach looks like this:
import qualified System.Environment as Env
import qualified Text.Megaparsec as M
import qualified Text.Megaparsec.Text as M
type Tag = (String, String)
data Segment = Tags [Tag] | URL String
deriving (Eq, Show)
tagP :: M.Parser Tag
tagP = M.char '#' *> ((,) <$> M.someTill M.printChar (M.char ':') <*> M.someTill M.printChar M.eol) M.<?> "Tag starting with #"
urlP :: M.Parser String
urlP = M.someTill M.printChar M.eol M.<?> "Some URL"
parser :: M.Parser Segment
parser = (Tags <$> M.many tagP) M.<|> (URL <$> urlP)
main :: IO ()
main = do
fname <- head <$> Env.getArgs
res <- M.parseFromFile (parser <* M.eof) fname
print res
If I try to run this on the above sample, I get a parsing error like this:
3:1:
unexpected 'h'
expecting Tag starting with # or end of input
Clearly my use of many in combination with <|> is incorrect. Since the tag parser won't consume any input from the URL parser it cannot be related to backtracking. How do I need to change this to get to the desired result?
The full example is available on GitHub.
† I'm actually using MegaParsec here for better error messages but I think the problem is quite generic and not about any particular implementation of parser combinators.
What you're doing works quite fine, only, at the moment you only parse a single segment (i.e., either only tags or only a URL), but that doesn't consume the whole input. It's eof that's causing the error.
Simply use one more many or some, to allow for multiple segments:
main :: IO ()
main = do
fname <- head <$> Env.getArgs
res <- M.parseFromFile (many parser <* M.eof) fname
print res
#cocreature answered this for me on Twitter.
As leftaroundabout pointed out here, there are two separate mistakes in my code:
The parser itself misuses <|> while it should just sequentially parse the lines and skip to the next parser if it doesn't consume any input.
The invocation (parseFromFile) only applies the parser function a single time and would fail as soon as it would get to the second block.
We can fix the parser and introduce grouping in one go:
parser :: M.Parser ([Tag], String)
parser = liftA2 (,) (M.many tagP) urlP
Afterwards, we just need to apply the change suggested by leftaroundabout:
...
res <- M.parseFromFile (M.many parser <* M.eof) fname
Running this leads to the desired result:
[([("foo","bar"),("faz","baz")],"https://example.com"),([("foo","beep")],"https://example.net")]

try function in parsing lambda expressions

I'm totally new to Haskell and trying to implement a "Lambda calculus" parser, that will be used to read the input to a lambda reducer .. It's required to parse bindings first "identifier = expression;" from a text file, then at the end there's an expression alone ..
till now it can parse bindings only, and displays errors when encountering an expression alone .. when I try to use the try or option functions, it gives a type mismatch error:
Couldn't match type `[Expr]'
with `Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]]'
Expected type: Text.Parsec.Prim.ParsecT
s0 u0 m0 (Text.Parsec.Prim.ParsecT s0 u0 m0 [[Expr]])
Actual type: Text.Parsec.Prim.ParsecT s0 u0 m0 [Expr]
In the second argument of `option', namely `bindings'
bindings weren't supposed to return anything, but I tried to add a return statement and it also returned a type mismatch error:
Couldn't match type `[Expr]' with `Expr'
Expected type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [Expr]
Actual type: Text.Parsec.Prim.ParsecT
[Char] u0 Data.Functor.Identity.Identity [[Expr]]
In the second argument of `(<|>)', namely `expressions'
Don't use <|> if you want to allow both
Your program parser does its main work with
program = do
spaces
try bindings <|> expressions
spaces >> eof
This <|> is choice - it does bindings if it can, and if that fails, expressions, which isn't what you want. You want zero or more bindings, followed by expressions, so let's make it do that.
Sadly, even when this works, the last line of your parser is eof and
First, let's allow zero bindings, since they're optional, then let's get both the bindings and the expressions:
bindings = many binding
program = do
spaces
bs <- bindings
es <- expressions
spaces >> eof
return (bs,es)
This error would be easier to find with plenty more <?> "binding" type hints so you can see more clearly what was expected.
endBy doesn't need many
The error message you have stems from the line
expressions = many (endBy expression eol)
which should be
expressions :: Parser [Expr]
expressions = endBy expression eol
endBy works like sepBy - you don't need to use many on it because it already parses many.
This error would have been easier to find with a stronger data type tree, so:
Use try to deal with common prefixes
One of the hard-to-debug problems you've had is when you get the error expecting space or "=" whilst parsing an expression. If we think about that, the only place we expect = is in a binding, so it must be part way through parsing a binding when we've given it an expression. This only happens if our expression starts with an identifier, just like a binding does.
binding sees the first identifier and says "It's OK guys, I've got this" but then finds no = and gives you an error, where we wanted it to backtrack and let expression have a go. The key point is we've already used the identifier input, and we want to unuse it. try is right for that.
Encase your binding parser with try so if it fails, we'll go back to the start of the line and hand over to expression.
binding = try (do
(Var id) <- identifier
_ <- char '='
spaces
exp <- expression
spaces
eol <?> "end of line"
return $ Eq id exp
<?> "binding")
It's important that as far as possible each parser starts with matching something unique to avoid this problem. (try is backtracking, hence inefficient, so should be avoided if possible.)
In particular, avoid starting parsers with spaces, but instead make sure you finish them all with spaces. Your main program can start with spaces if you like, since it's the only alternative.
Use types for most productions - better structure & readability
My first piece of general advice is that you could do with a more fine-grained data type, and should annotate your parsers with their type. At the moment, everything's wrapped up in Expr, which means you can only get error messages about whether you have an Expr or a [Expr]. The fact that you had to add Eq to Expr is a sign you're pushing the type too far.
Usually it's worth making a data type for quite a lot of the productions, and if you import Control.Applicative hiding ((<|>),(<$>),many) Control.Applicative you can use <$> and <*> so that the production, the datatype and the parser are all the same structure:
--<program> ::= <spaces> [<bindings>] <expressions>
data Program = Prog [Binding] [Expr]
program = spaces >> Prog <$> bindings <*> expressions
-- <expression> ::= <abstraction> | factors
data Expression = Ab Abstraction | Fa [Factor]
expression = Ab <$> abstraction <|> Fa <$> factors <?> "expression"
Don't do this with letters for example, but for important things. What counts as important things is a matter of judgement, but I'd start with Identifiers. (You can use <* or *> to not include syntax like = in the results.)
Amended code:
Before refactoring types and using Applicative here
And afterwards here

Using Parsec to write a Read instance

Using Parsec, I'm able to write a function of type String -> Maybe MyType with relative ease. I would now like to create a Read instance for my type based on that; however, I don't understand how readsPrec works or what it is supposed to do.
My best guess right now is that readsPrec is used to build a recursive parser from scratch to traverse a string, building up the desired datatype in Haskell. However, I already have a very robust parser who does that very thing for me. So how do I tell readsPrec to use my parser? What is the "operator precedence" parameter it takes, and what is it good for in my context?
If it helps, I've created a minimal example on Github. It contains a type, a parser, and a blank Read instance, and reflects quite well where I'm stuck.
(Background: The real parser is for Scheme.)
However, I already have a very robust parser who does that very thing for me.
It's actually not that robust, your parser has problems with superfluous parentheses, it won't parse
((1) (2))
for example, and it will throw an exception on some malformed inputs, because
singleP = Single . read <$> many digit
may use read "" :: Int.
That out of the way, the precedence argument is used to determine whether parentheses are necessary in some place, e.g. if you have
infixr 6 :+:
data a :+: b = a :+: b
data C = C Int
data D = D C
you don't need parentheses around a C 12 as an argument of (:+:), since the precedence of application is higher than that of (:+:), but you'd need parentheses around C 12 as an argument of D.
So you'd usually have something like
readsPrec p = needsParens (p >= precedenceLevel) someParser
where someParser parses a value from the input without enclosing parentheses, and needsParens True thing parses a thing between parentheses, while needsParens False thing parses a thing optionally enclosed in parentheses [you should always accept more parentheses than necessary, ((((((1)))))) should parse fine as an Int].
Since the readsPrec p parsers are used to parse parts of the input as parts of the value when reading lists, tuples etc., they must return not only the parsed value, but also the remaining part of the input.
With that, a simple way to transform a parsec parser to a readsPrec parser would be
withRemaining :: Parser a -> Parser (a, String)
withRemaining p = (,) <$> p <*> getInput
parsecToReadsPrec :: Parser a -> Int -> ReadS a
parsecToReadsPrec parsecParser prec input
= case parse (withremaining $ needsParens (prec >= threshold) parsecParser) "" input of
Left _ -> []
Right result -> [result]
If you're using GHC, it may however be preferable to use a ReadPrec / ReadP parser (built using Text.ParserCombinators.ReadP[rec]) instead of a parsec parser and define readPrec instead of readsPrec.

FParsec identifiers vs keywords

For languages with keywords, some special trickery needs to happen to prevent for example "if" from being interpreted as an identifier and "ifSomeVariableName" from becoming keyword "if" followed by identifier "SomeVariableName" in the token stream.
For recursive descent and Lex/Yacc, I've simply taken the approach (as per helpful instruction) of transforming the token stream between the lexer and the parser.
However, FParsec doesn't really seem do a separate lexer step, so I'm wondering what the best way to deal with this is. Speaking of, it seems like Haskell's Parsec supports a lexer layer, but FParsec does not?
I think, this problem is very simple. The answer is that you have to:
Parse out an entire word ([a-z]+), lower case only;
Check if it belongs to a dictionary; if so, return a keyword; otherwise, the parser will fall back;
Parse identifier separately;
E.g. (just a hypothetical code, not tested):
let keyWordSet =
System.Collections.Generic.HashSet<_>(
[|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|]
)
let pKeyword =
(many1Satisfy isLower .>> nonAlphaNumeric) // [a-z]+
>>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword")
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
The code above will parse a keyword or an identifier twice. To fix it, alternatively, you may:
Parse out an entire word ([a-z][A-Z]+[a-z][A-Z][0-9]+), e.g. everything alphanumeric;
Check if it's a keyword or an identifier (lower case and belonging to a dictionary) and either
Return a keyword
Return an identifier
P.S. Don't forget to order "cheaper" parsers first, if it does not ruin the logic.
You can define a parser for whitespace and check if keyword or identifier is followed by it.
For example some generic whitespace parser will look like
let pWhiteSpace = pLineComment <|> pMultilineComment <|> pSpaces
this will require at least one whitespace
let ws1 = skipMany1 pWhiteSpace
then if will look like
let pIf = pstring "if" .>> ws1

Resources