Here is my function:
let rec applyAll rules expr =
rules
|> List.fold (fun state rule ->
match state with
| Some e ->
match applyRule rule e with
| Some newE -> Some newE
| None -> Some e
| None -> applyRule rule expr) None
|> Option.bind (applyAll rules)
It takes a set of rules and applies them until the input expression is reduced as far as possible. I could rewrite the Option.bind to be a match expression and it would clearly take advantage of tail call optimization. However, this is more elegant to me, so I would like to keep it as is unless it will be consuming stack unnecessarily. Does F# do TCO with this code?
EDIT: This code always returns None; I'll be fixing that, but I think the question still makes sense.
I pasted your code into a file tco.fs. I added an applyRule function to make it compilable.
tco.fs
let applyRule rule exp =
Some ""
let rec applyAll rules expr =
rules
|> List.fold (fun state rule ->
match state with
| Some e ->
match applyRule rule e with
| Some newE -> Some newE
| None -> Some e
| None -> applyRule rule expr) None
|> Option.bind (applyAll rules)
Then I made a batch file to analyze the IL.
compile_and_dasm.bat
SET ILDASM="C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6 Tools\ildasm.exe"
Fsc tco.fs
%ILDASM% /out=tco.il /NOBAR /tok tco.exe
As an output we find the tco.il containing the IL. The relevant function is here.
.method /*06000002*/ public static class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpOption`1/*01000003*/<!!b>
applyAll<a,b>(class [FSharp.Core/*23000002*/]Microsoft.FSharp.Collections.FSharpList`1/*01000008*/<!!a> rules,
string expr) cil managed
{
.custom /*0C000003:0A000003*/ instance void [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute/*01000007*/::.ctor(int32[]) /* 0A000003 */ = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 )
// Code size 26 (0x1a)
.maxstack 8
IL_0000: ldarg.0
IL_0001: newobj instance void class Tco/*02000002*//applyAll#13/*02000003*/<!!b,!!a>/*1B000004*/::.ctor(class [FSharp.Core/*23000002*/]Microsoft.FSharp.Collections.FSharpList`1/*01000008*/<!1>) /* 0A000004 */
IL_0006: newobj instance void class Tco/*02000002*//'applyAll#6-1'/*02000004*/<!!a>/*1B000005*/::.ctor() /* 0A000005 */
IL_000b: ldnull
IL_000c: ldarg.0
IL_000d: call !!1 [FSharp.Core/*23000002*/]Microsoft.FSharp.Collections.ListModule/*01000009*/::Fold<!!0,class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpOption`1/*01000003*/<string>>(class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpFunc`2/*01000002*/<!!1,class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpFunc`2/*01000002*/<!!0,!!1>>,
!!1,
class [FSharp.Core/*23000002*/]Microsoft.FSharp.Collections.FSharpList`1/*01000008*/<!!0>) /* 2B000001 */
IL_0012: tail.
IL_0014: call class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpOption`1/*01000003*/<!!1> [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.OptionModule/*0100000A*/::Bind<string,!!1>(class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpFunc`2/*01000002*/<!!0,class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpOption`1/*01000003*/<!!1>>,
class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.FSharpOption`1/*01000003*/<!!0>) /* 2B000002 */
IL_0019: ret
} // end of method Tco::applyAll
We see here that the tail opcode is generated. This is a hint from the IL compiler to the JIT compiler (which actually generates the executable machine code), that a tail call should be possible here.
Whether the tail call is actually executed as such is up to the JIT compiler, as can be read here.
The answer is "no."
As you said, the call will be optimized by "expanding" Option.Bind into a match expression. Doing so would put the recursive call to applyAll properly in the tail position.
With Option.Bind in the tail position, the stack will grow like
+ …
+ applyAll
+ Option.Bind
+ applyAll
+ Option.Bind
_ applyAll
and the F# compiler will not optimize this.
Related
Writing my own XML parser to learn FParsec I need to test that the XML start and end tags match or have the parser fail.
In the code fragment below ...
The parsers xStartTag and xKey return strings which I want to match.
The parser xContent_UntilCloseTag just returns whatever is between the tags.
ws skips whitespace and str(">") skips a '>'.
The PipeN.pipe10 function is an extension of the standard FParsec primitive pipe5 to feed the result of 10 parsers into a function.
All these parsers compile and work.
How can I get the following parser of type Parser<XELEMENT, USER_STATE> constructing the type XELEMENT_CONTENT to fail when the start and end tags do not match?
00 let xElement_Content : Parser<XELEMENT, USER_STATE> =
01 (PipeN.pipe10 ws xStartTag ws xContent_UntilCloseTag ws xKey ws (str ">") ws
02 (fun stream -> getUserState stream)
03 (fun x1 x2_startTag x3 x4_content x5 x6_closeTag x7 x8 x9 userState ->
04 if x2_startTag.head = x6_closeTag
05 then
06 (userState.Deeper(x2_startTag.head), x2_startTag, x4_content)
07 else Reply(FatalError, messageError ("in xElementContent: head tag (%s) does not match close tag (%s)", x2_startTag.head, x6_closeTag))
08 ) |>> C_XELEMENT_CONTENT
09 )
Line 07 throws the compiler error ...
FS0001: All branches of an 'if' expreession must return values of the same type as the first branch which here is 'USER_STATE * XHEADandATTRIBUTES_RECORD * string'. This branch returns a value of the type 'Reply<`a>'.
The code compiles fine if I comment out the if-then-else statement (lines 04, 05, and 07) leaving 06. Types agree in that case. I think I understand that.
But I need to throw an error and have the parser fail when the strings returned from xStartTag and xKey don't agree - How?
I don't think knowing the types is necessary for the answer; but just in case here are the various type definitions...
type USER_STATE =
{
tag : string
depth : int
}
type XELEMENT_CONTENT = USER_STATE * XHEADandATTRIBUTES_RECORD * string
type XELEMENT_EMPTY = XHEADandATTRIBUTES_RECORD
type XELEMENT =
| C_XELEMENT_CONTENT of XELEMENT_CONTENT
| C_XELEMENT_NESTED of USER_STATE * XHEADandATTRIBUTES_RECORD * (XELEMENT list)
| C_XELEMENT_EMPTY of XELEMENT_EMPTY
type XELEMENT_NESTED = XHEADandATTRIBUTES_RECORD * (XELEMENT list)
I have reviewed the FParsec documentation in detail (especially the User's Guide on 'Parsing with User State') but maybe I missed something?
I don't think you need to maintain user state to match XML tags. Here's a very simple parser that detects mismatches correctly (but doesn't handle nested tags):
open FParsec
let parseTagOpen =
pstring "<"
>>. manySatisfy (fun c -> c <> '>')
.>> pstring ">"
let parseTagClose =
pstring "</"
>>. manySatisfy (fun c -> c <> '>')
.>> pstring ">"
let parseInput =
parseTagOpen
.>>. (manySatisfy (fun c -> c <> '<'))
.>>. parseTagClose
>>= (fun ((tag1, content), tag2) -> // *** this is the important part ***
if tag1 = tag2 then preturn (tag1, content)
else failFatally "mismatch")
[<EntryPoint>]
let main argv =
[
"<moo>baa</moo>"
"<moo>baa</oink>"
] |> Seq.iter (fun input ->
let result = run parseInput input
printfn ""
printfn "%s" input
printfn "%A" result)
0
beginner f# programmer here.
So basically I'm totally lost. I've been staring at this question for the past hour not even knowing how to set up the first line of the let argument. This question is asking to use pattern matching define a recursive function size : expr -> int that returns the size of its input expression denfied as the number of constructors from type expr in the expression
This is the recursive function:
size : expr -> int
Here are the constructors:
type oper = Neg | Not | Add | Mul | Sub | Less | Eq | And
type expr =
| C of int
| Op1 of oper * expr
| Op2 of oper * expr * expr
| If of expr * expr * expr
for example,
size(C 4)
would return 1
and
size (IF (C 4, Op2 (Add, C 1, C 2), C ())
would return 6
UPDATED AFTER SUGGESTION: IN PROGRESS!!
let rec size (e : expr) : int =
match e with
| C i -> 1
| Op1 (o, e1) -> size e1 + 1
| Op2 (o, e1, e2)-> size e2 + 1
| If (e1, e2, e3) -> size e3 + 1
I'll summarize the hints that you were given in the comments, so that others who find this question can see that it was answered and that you were able to solve the problem:
Your original code didn't need a parameter of type oper; a single parameter of type expr was enough.
You don't need a mutable counter variable; recursion will serve you better.
When writing a recursive function like size, you can sometimes get stuck on "Okay, I need to call my size function here... but since I haven't written it yet, how do I know what it's going to return?" The best way to get unstuck is to pretend you've written the function already, so you already know what it's going to return -- and then use that function. And magically, it all works out: by the time you've finished the function, the function that it was calling (itself) is finished too! Funny how that works out. :-)
At one point you had a function that looked like this:
let rec size (e : expr) : int =
match e with
| C i -> 1
| Op1 (o, e1) -> size e + 1
// Rest of function omitted
That was giving you a StackOverflowException, because calling size e inside the Op1 match case was an infinite recursion loop. Hint: trace that call mentally, and think about the steps it would go through. It would check e against Op1, and call size e for the second time. That call would check e against Op1, and call size e for the third time. Would that ever terminate? Would those calls every do anything different from the previous calls, or would they keep looping "forever" until the function stack runs out of space?
Finally, in the cases where you have two or three expr variables, you need to deal with all of them, not just one.
Myello! So I am looking for a concise, efficient an idiomatic way in F# to parse a file or a string. I have a strong preference to treat the input as a sequence of char (char seq). The idea is that every function is responsible to parse a piece of the input, return the converted text tupled with the unused input and be called by a higher level function that chains the unused input to the following functions and use the results to build a compound type. Every parsing function should therefore have a signature similar to this one: char seq -> char seq * 'a . If, for example, the function's responsibility is simply to extract the first word, then, one approach would be the following:
let parseFirstWord (text: char seq) =
let rec forTailRecursion t acc =
let c = Seq.head t
if c = '\n' then
(t, acc)
else
forTailRecursion (Seq.skip 1 t) (c::acc)
let rest, reversedWord = forTailRecursion text []
(rest, List.reverse reversedWord)
Now, of course the main problem with this approach is that it extracts the word in reverse order and so you have to reverse it. Its main advantages however are that is uses strictly functional features and proper tail recursion. One could avoid the reversing of the extracted value while losing tail recursion:
let rec parseFirstWord (text: char seq) =
let c = Seq.head t
if c = '\n' then
(t, [])
else
let rest, tail = parseFirstWord (Seq.skip 1 t)
(rest, (c::tail))
Or use a fast mutable data structure underneath instead of using purely functional features, such as:
let parseFirstWord (text: char seq) =
let rec forTailRecursion t queue =
let c = Seq.head t
if c = '\n' then
(t, queue)
else
forTailRecursion (Seq.skip 1 t) (queue.Enqueu(c))
forTailRecursion text (new Queue<char>())
I have no idea how to use OO concepts in F# mind you so corrections to the above code are welcome.
Being new to this language, I would like to be guided in terms of the usual compromises that an F# developer makes. Among the suggested approaches and your own, which should I consider more idiomatic and why? Also, in that particular case, how would you encapsulate the return value: char seq * char seq, char seq * char list or evenchar seq * Queue<char>? Or would you even consider char seq * String following a proper conversion?
I would definitely have a look at FSLex. FSYacc, FParsec. However if you just want to tokenize a seq<char> you can use a sequence expression to generate tokens in the right order. Reusing your idea of a recursive inner function, and combinining with a sequence expression, we can stay tail recursive like shown below, and avoid non-idiomatic tools like mutable data structures.
I changed the separator char for easy debugging and the signature of the function. This version produces a seq<string> (your tokens) as result, which is probably easier to consume than a tuple with the current token and the rest of the text. If you just want the first token, you can just take the head. Note that the sequence is generated 'on demand', i.e. the input is parsed only as tokens are consumed through the sequence. Should you need the remainder of the input text next to each token, you can yield a pair in loop instead, but I'm guessing the downstream consumer most likely wouldn't (furthermore, if the input text is itself a lazy sequence, possibly linked to a stream, we don't want to expose it as it should be iterated through only in one place).
let parse (text : char seq) =
let rec loop t acc =
seq {
if Seq.isEmpty t then yield acc
else
let c, rest = Seq.head t, Seq.skip 1 t
if c = ' ' then
yield acc
yield! loop rest ""
else yield! loop rest (acc + string c)
}
loop text ""
parse "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]
This is not the only 'idiomatic' way of doing this in F#. Every time we need to process a sequence, we can look at the functions made available in the Seq module. The most general of these is fold which iterates through a sequence once, accumulating a state at each element by running a given function. In the example below accumulate is such a function, that progressively builds the resulting sequence of tokens. Since Seq.fold doesn't run the accumulator function on an empty sequence, we need the last two lines to extract the last token from the function's internal accumulator.
This second implementation keeps the nice characteriestics of the first, i.e. tail recursion (inside the fold implementation, if I'm not mistaken) and processing of the input sequence on demand. It also happens to be shorter, albeit a bit less readable probably.
let parse2 (text : char seq) =
let accumulate (res, acc) c =
if c = ' ' then (Seq.append res (Seq.singleton acc), "")
else (res, acc + string c)
let (acc, last) = text |> Seq.fold accumulate (Seq.empty, "")
Seq.append acc (Seq.singleton last)
parse2 "The FOX is mine"
val it : seq<string> = seq ["The"; "FOX"; "is"; "mine"]
One way of lexing/parsing in a way truly unique to F# is by using active patterns. The following simplified example shows the general idea. It can process a calculation string of arbitrary length without producing a stack overflow.
let rec (|CharOf|_|) set = function
| c :: rest when Set.contains c set -> Some(c, rest)
| ' ' :: CharOf set (c, rest) -> Some(c, rest)
| _ -> None
let rec (|CharsOf|) set = function
| CharOf set (c, CharsOf set (cs, rest)) -> c::cs, rest
| rest -> [], rest
let (|StringOf|_|) set = function
| CharsOf set (_::_ as cs, rest) -> Some(System.String(Array.ofList cs), rest)
| _ -> None
type Token =
| Int of int
| Add | Sub | Mul | Div | Mod
| Unknown
let lex: string -> _ =
let digits = set ['0'..'9']
let ops = Set.ofSeq "+-*/%"
let rec lex chars =
seq { match chars with
| StringOf digits (s, rest) -> yield Int(int s); yield! lex rest
| CharOf ops (c, rest) ->
let op =
match c with
| '+' -> Add | '-' -> Sub | '*' -> Mul | '/' -> Div | '%' -> Mod
| _ -> failwith "invalid operator char"
yield op; yield! lex rest
| [] -> ()
| _ -> yield Unknown }
List.ofSeq >> lex
lex "1234 + 514 / 500"
// seq [Int 1234; Add; Int 514; Div; Int 500]
As I mentioned in a recent SO question, I'm learning F# by going through the Project Euler problems.
I now have a functioning answer to Problem 3 that looks like this:
let rec findLargestPrimeFactor p n =
if n = 1L then p
else
if n % p = 0L then findLargestPrimeFactor p (n/p)
else findLargestPrimeFactor (p + 2L) n
let result = findLargestPrimeFactor 3L 600851475143L
However, since there are 2 execution paths that can lead to a different call to findLargestPrimeFactor, I'm not sure it can be optimized for tail recursion. So I came up with this instead:
let rec findLargestPrimeFactor p n =
if n = 1L then p
else
let (p', n') = if n % p = 0L then (p, (n/p)) else (p + 2L, n)
findLargestPrimeFactor p' n'
let result = findLargestPrimeFactor 3L 600851475143L
Since there's only one path that leads to a tail call to findLargestPrimeFactor, I figure it is indeed going to be optimized for tail recursion.
So my questions:
Can the first implementation be optimized for tail recursion even if there are two distinct recursive calls?
If both versions can be optimized for tail recursion, is there one better (more "functional", faster, etc) than the other?
Your first findLargestPrimeFactor function is tail recursive - a function can be made tail recursive if all recursive calls occur in the tail position, even if there are more than one.
Here's the IL of the compiled function:
.method public static int64 findLargestPrimeFactor(int64 p,
int64 n) cil managed
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 )
// Code size 56 (0x38)
.maxstack 8
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldc.i8 0x1
IL_000b: bne.un.s IL_000f
IL_000d: br.s IL_0011
IL_000f: br.s IL_0013
IL_0011: ldarg.0
IL_0012: ret
IL_0013: ldarg.1
IL_0014: ldarg.0
IL_0015: rem
IL_0016: brtrue.s IL_001a
IL_0018: br.s IL_001c
IL_001a: br.s IL_0026
IL_001c: ldarg.0
IL_001d: ldarg.1
IL_001e: ldarg.0
IL_001f: div
IL_0020: starg.s n
IL_0022: starg.s p
IL_0024: br.s IL_0000
IL_0026: ldarg.0
IL_0027: ldc.i8 0x2
IL_0030: add
IL_0031: ldarg.1
IL_0032: starg.s n
IL_0034: starg.s p
IL_0036: br.s IL_0000
} // end of method LinkedList::findLargestPrimeFactor
The first branch in the else clause (i.e. if n % p = 0L) starts at IL_0013 and continues until IL_0024 where it unconditionally branches back to the entry point of the function.
The second branch in the else clause starts at IL_0026 and continues until the end of the function where it again unconditionally branches back to the start of the function. The F# compiler has converted your recursive function into a loop for both cases of the else clause which contains the recursive calls.
Can the first implementation be optimized for tail recursion even if there are two distinct recursive calls?
The number of recursive branches is orthogonal with tail recursion. Your first function is tail-recursive since findLargestPrimeFactor is the last operation on both two branches. If in doubt, you can try to run the function in Release mode (where tail call optimization option is turned on by default) and observe results.
If both versions can be optimized for tail recursion, is there one better (more "functional", faster, etc) than the other?
There is just a slight difference between two versions. The second version creates an extra tuple, but it will not slow down computation that much. I consider the first function more readable and straight to the point.
To be nitpicking, the first variant is shorter using elif keyword:
let rec findLargestPrimeFactor p n =
if n = 1L then p
elif n % p = 0L then findLargestPrimeFactor p (n/p)
else findLargestPrimeFactor (p + 2L) n
Another version is to use pattern matching:
let rec findLargestPrimeFactor p = function
| 1L -> p
| n when n % p = 0L -> findLargestPrimeFactor p (n/p)
| n -> findLargestPrimeFactor (p + 2L) n
Since the underlying algorithm is the same, it will not be faster either.
I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + - has their normal meaning
The specific task is to write a function parse :: String -> AST
that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).
I am told that i might need a suitable data type and that data type might need to derive from some other classes.
Following an example output
data AST = Leaf Int | Sum AST AST | Min AST | ...
Further more, i should consider writing a function
tokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished with
ast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.
I should also make a printExpr method to print the result so that
printE: AST -> String
printE(parse "* 5 5") yields either "5*5" or "(5*5)"
and also a function to evaluate the expression
evali :: AST -> Int
I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution!
Thanks in advance!
Edit
I might have been unclear:
I'm wondering how i should go about from having read and tokenized an input string to making an AST.
Parsing tokens into an Abstract Syntax Tree
OK, let's take your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
This is a nice easy grammar, because you can tell from the first token what sort of epression it will be.
(If there was something more complicated, like + coming between numbers, or - being used for subtraction as
well as negation, you'd need the list-of-successes trick, explained in
Functional Parsers.)
Let's have some sample raw input:
rawinput = "- 6 + 45 let x = - 5 in * x x"
Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))",
and I'll assume you tokenised it as
tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]
which fits the grammar, but you might well have got
tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]
which fits your sample AST better. I think it's good practice to name your AST after bits of your grammar,
so I'm going to go ahead and replace
data AST = Leaf Int | Sum AST AST | Min AST | ...
with
data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char
| E_Let {letvar::Char,letequal:: Expr,letin::Expr}
deriving Show
I've named the bits of an E_Let to make it clearer what they represent.
Writing a parsing function
You could use isDigit by adding import Data.Char (isDigit) to help out:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')
| s == "+" = (E_Sum e e',ss'') where
(e,ss') = expr ss
(e',ss'') = expr ss'
-- more cases
Yikes! Too many let clauses obscuring the meaning,
and we'll be writing the same code for E_Prod and very much worse for E_Let.
Let's get this sorted out!
The standard way of dealing with this is to write some combinators;
instead of tiresomely threading the input [String]s through our definition, define ways to
mess with the output of parsers (map) and combine
multiple parsers into one (lift).
Clean up the code 1: map
First we should define pmap, our own equivalent of the map function so we can do pmap E_Neg (expr1 ss)
instead of let (e,ss') = expr1 ss in (E_Neg e,ss')
pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))
nonono, I can't even read that! We need a type synonym:
type Parser a = [String] -> (a,[String])
pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = \ss -> let (a,ss') = p ss
in (f a,ss')
But really this would be better if I did
data Parser a = Par [String] -> (a,[String])
so I could do
instance Functor Parser where
fmap f (Par p) = Par (pmap f p)
I'll leave that for you to figure out if you fancy.
Clean up the code 2: combining two parsers
We also need to deal with the situation when we have two parsers to run,
and we want to combine their results using a function. This is called lifting the function to parsers.
liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = \ss0 -> let
(a,ss1) = p1 ss0
(b,ss2) = p2 ss1
in (f a b,ss2)
or maybe even three parsers:
liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d
I'll let you think how to do that.
In the let statement you'll need liftP5 to parse the sections of a let statement,
lifting a function that ignores the "=" and "in". You could make
equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s
and a couple more to help out with this.
Actually, pmap could also be called liftP1, but map is the traditional name for that sort of thing.
Rewritten with the nice combinators
Now we're ready to clean up expr:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = pmap E_Neg expr ss
| s == "+" = liftP2 E_Sum expr expr ss
-- more cases
That'd all work fine. Really, it's OK. But liftP5 is going to be a bit long, and feels messy.
Taking the cleanup further - the ultra-nice Applicative way
Applicative Functors is the way to go.
Remember I suggested refactoring as
data Parser a = Par [String] -> (a,[String])
so you could make it an instance of Functor? Perhaps you don't want to,
because all you've gained is a new name fmap for the perfectly working pmap and
you have to deal with all those Par constructors cluttering up your code.
Perhaps this will make you reconsider, though; we can import Control.Applicative,
then using the data declaration, we can
define <*>, which sort-of means then and use <$> instead of pmap, with *> meaning
<*>-but-forget-the-result-of-the-left-hand-side so you would write
expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr
Which looks a lot like your grammar definition, so it's easy to write code that works first time.
This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things.
You'd only have to define fmap, <*> and pure, all simple ones, and no long repetative liftP3, liftP4 etc.
Read up about Applicative Functors. They're great.
Applicative in Learn You a Haskell for Great Good!
Applicative in Haskell wikibook
Hints for making Parser applicative: pure doesn't change the list.
<*> is like liftP2, but the function doesn't come from outside, it comes as the output from p1.
To make a start with Haskell itself, I'd recommend Learn You a Haskell for Great Good!, a very well-written and entertaining guide. Real World Haskell is another oft-recommended starting point.
Edit: A more fundamental introduction to parsing is Functional Parsers. I wanted How to replace a failure by a list of successes by PHilip Wadler. Sadly, it doesn't seem to be available online.
To make a start with parsing in Haskell, I think you should first read monadic parsing in Haskell, then maybe this wikibook example, but also then the parsec guide.
Your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
suggests a few abstract data types:
data Dig = Dig_0 | Dig_1 | Dig_2 | Dig_3 | Dig_4 | Dig_5 | Dig_6 | Dig_7 | Dig_8 | Dig_9
data Integ = I_Dig Dig | I_DigInt Dig Integ
data Var = Var_a | Var_b | ... Var_z | Var_A | Var_B | Var_C | ... | Var_Z
data Expr = Expr_I Integ
| Expr_Neg Expr
| Expr_Plus Expr Expr
| Expr_Times Expr Expr Var
| Expr_Var Var
| Expr_let Var Expr Expr
This is inherently a recursively defined syntax tree, no need to make another one.
Sorry about the clunky Dig_ and Integ_ stuff - they have to start with an uppercase.
(Personally I'd want to be converting the Integs to Ints straight away, so would have done newtype Integ = Integ Int, and would probably have done newtype Var = Var Char but that might not suit you.)
Once you've done with the basic ones - dig and var, and neg_, plus_, in_ etcI'd go with the Applicative interface to build them up, so for example your parser expr for Expr would be something like
expr = Expr_I <$> integ
<|> Expr_Neg <$> neg_ *> expr
<|> Expr_Plus <$> plus_ *> expr <*> expr
<|> Expr_Times <$> times_ *> expr <*> expr
<|> Expr_Var <$> var
<|> Expr_let <$> let_ *> var <*> equals_ *> expr <*> in_ *> expr
So nearly all the time, your Haskell code is clean and closely resembles the grammar you were given.
OK, so it seems you're trying to build lots and lots of stuff, and you're not really sure exactly where it's all going. I would suggest that getting the definition for AST right, and then trying to implement evali would be a good start.
The grammer you've listed is interesting... You seem to want to input * 5 5, but output 5*5, whic is an odd choice. Is that really supposed to be a unary minus, not binary? Simlarly, * Expr Expr Var looks like perhaps you might have meant to type * Expr Expr | Var...
Anyway, making some assumptions about what you meant to say, your AST is going to look something like this:
data AST = Leaf Int | Sum AST AST | Minus AST | Var String | Let String AST AST
Now, let us try to do printE. It takes an AST and gives us a string. By the definition above, the AST has to be one of five possible things. You just need to figure out what to print for each one!
printE :: AST -> String
printE (Leaf x ) = show x
printE (Sum x y) = printE x ++ " + " ++ printE y
printE (Minus x ) = "-" ++ printE x
...
show turns an Int into a String. ++ joins two strings together. I'll let you work out the rest of the function. (The tricky thing is if you want it to print brackets to correctly show the order of subexpressions... Since your grammer doesn't mention brackets, I guess no.)
Now, how about evali? Well, this is going to be a similar deal. If the AST is a Leaf x, then x is an Int, and you just return that. If you have, say, Minus x, then x isn't an integer, it's an AST, so you need to turn it into an integer with evali. The function looks something like
evali :: AST -> Int
evali (Leaf x ) = x
evali (Sum x y) = (evali x) + (evali y)
evali (Minus x ) = 0 - (evali x)
...
That works great so far. But wait! It looks like you're supposed to be able to use Let to define new variables, and refer to them later with Var. Well, in that case, you need to store those variables somewhere. And that's going to make the function more complicated.
My recommendation would be to use Data.Map to store a list of variable names and their corresponding values. You'll need to add the variable map to a type signature. You can do it this way:
evali :: AST -> Int
evali ast = evaluate Data.Map.empty ast
evaluate :: Map String Int -> AST -> Int
evaluate m ast =
case ast of
...same as before...
Let var ast1 ast2 -> evaluate (Data.Map.insert var (evaluate m ast1)) ast2
Var var -> m ! var
So evali now just calls evaluate with an empty variable map. When evaluate sees Let, it adds the variable to the map. And when it sees Var, it looks up the name in the map.
As for parsing a string into an AST in the first place, that is an entire other answer again...