My Question: What is the cleanest way to pretty print an expression without redundant parentheses?
I have the following representation of lambda expressions:
Term ::= Fun(String x, Term t)
| App(Term t1, Term t2)
| Var(String x)
By convention App is left associative, that is a b c is interpreted as (a b) c and function bodies stretch as far to the right as possible, that is, λ x. x y is interpreted as λ x. (x y).
I have a parser that does a good job, but now I want a pretty printer. Here's what I currently have (pseudo scala):
term match {
case Fun(v, t) => "(λ %s.%s)".format(v, prettyPrint(t))
case App(s, t) => "(%s %s)".format(prettyPrint(s), prettyPrint(t))
case Var(v) => v
}
The above printer always puts ( ) around expressions (except for atomic variables). Thus for Fun(x, App(Fun(y, x), y)) it produces
(λ x.((λ y.x) y))
I would like to have
λ x.(λ y.x) y
Here I'll use a simple grammar for infix expressions with the associativity and precedence defined by the following grammar whose operators are listed in ascending order of precedence
E -> E + T | E - T | T left associative
T -> T * F | T / F | F left associative
F -> G ^ F | G right associative
G -> - G | ( E ) | NUM
Given an abstract syntax tree (AST) we convert the AST to a string with only the necessary parenthesis as described in the pseudocode below. We examine relative precedence and associativity as we recursively descend the tree to determine when parenthesis are necessary. Note that all decisions to wrap parentheses around an expression must be made in the parent node.
toParenString(AST) {
if (AST.type == NUM) // simple atomic type (no operator)
return toString(AST)
else if (AST.TYPE == UNARY_MINUS) // prefix unary operator
if (AST.arg.type != NUM AND
precedence(AST.op) > precedence(AST.arg.op))
return "-(" + toParenString(AST.arg) + ")"
else
return "-" + toParenString(AST.arg)
else { // binary operation
var useLeftParen =
AST.leftarg.type != NUM AND
(precedence(AST.op) > precedence(AST.leftarg.op) OR
(precedence(AST.op) == precedence(AST.leftarg.op) AND
isRightAssociative(AST.op)))
var useRightParen =
AST.rightarg.type != NUM AND
(precedence(AST.op) > precedence(AST.rightarg.op) OR
(precedence(AST.op) == precedence(AST.rightarg.op) AND
isLeftAssociative(AST.op)))
var leftString;
if (useLeftParen) {
leftString = "(" + toParenString(AST.leftarg) + ")"
else
leftString = toParenString(AST.leftarg)
var rightString;
if (useRightParen) {
rightString = "(" + toParenString(AST.rightarg) + ")"
else
rightString = toParenString(AST.rightarg)
return leftString + AST.op + rightString;
}
}
Isn't it that you just have to check the types of the arguments of App?
I'm not sure how to write this in scala..
term match {
case Fun(v: String, t: Term) => "λ %s.%s".format(v, prettyPrint(t))
case App(s: Fun, t: App) => "(%s) (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: App) => "%s (%s)".format(prettyPrint(s), prettyPrint(t))
case App(s: Fun, t: Term) => "(%s) %s".format(prettyPrint(s), prettyPrint(t))
case App(s: Term, t: Term) => "%s %s".format(prettyPrint(s), prettyPrint(t))
case Var(v: String) => v
}
Related
I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.
I am working on making a Lisp interpreter in OCaml. I naturally started with the front-end. So far I have an S-expression parsing algorithm that works most of the time. For both simple S-expressions like (a b) and ((a b) (c d)) my function, ast_as_str, shows that the output list structure is incorrect. I've documented that below. After trying countless variations on parse nothing seems to work. Does someone adept in writing parsers in OCaml have a suggestion as to how I can fix my code?
type s_expression = Nil | Atom of string | Pair of s_expression * s_expression
let rec parse tokens =
match tokens with
| [] -> Nil
| token :: rest ->
match token with
| "(" -> parse rest
| ")" -> Pair(Nil, parse rest)
| atom -> Pair(Atom atom, parse rest)
let rec ast_as_str ast =
match ast with
| Nil -> "nil"
| Atom a -> Printf.sprintf "%s" a
| Pair(a, b) -> Printf.sprintf "(%s %s)" (ast_as_str a) (ast_as_str b);;
let check_output test = print_endline (ast_as_str (parse test));;
(*
Input:
(a b)
Output:
(a (b (nil nil)))
Almost correct...
*)
check_output ["("; "a"; "b"; ")"];;
(*
Input:
((w x) (y z))
Output:
(w (x (nil (y (z (nil (nil nil)))))))
Incorrect.
*)
check_output ["("; "("; "w"; "x"; ")"; "("; "y"; "z"; ")"; ")"]
I'm going to assume this isn't homework. If it is, I will change the answer to some less specific hints.
A recursive descent parser works by recognizing the beginning token of a construct, then parsing the contents of the construct, then (very often) recognizing the ending token of the construct. S-expressions have just one construct, the parenthesized list. Your parser isn't doing the recognizing of the end of the construct.
If you assume your parser works correctly, then encountering a right parenthesis ) is a syntax error. There shouldn't be any unmatched right parentheses, and matching right parentheses are parsed as part of the parenthesized list construct (as I described above).
If you swear that this is just a personal project I'd be willing to write up a parser. But you should try writing up something as described above.
Note that when you see an atom, you aren't seeing a pair. It's not correct to return Pair (Atom xyz, rest) when seeing an atom.
Update
The way to make things work in a functional setting is to have the parsing functions return not only the construct that they saw, but also the remaining tokens that haven't been parsed yet.
The following code works for your examples and is probably pretty close to correct:
let rec parse tokens =
match tokens with
| [] -> failwith "Syntax error: end of input"
| "(" :: rest ->
(match parselist rest with
| (sexpr, ")" :: rest') -> (sexpr, rest')
| _ -> failwith "Syntax error: unmatched ("
)
| ")" :: _ -> failwith "Syntax error: unmatched )"
| atom :: rest -> (Atom atom, rest)
and parselist tokens =
match tokens with
| [] | ")" :: _ -> (Nil, tokens)
| _ ->
let (sexpr1, rest) = parse tokens in
let (sexpr2, rest') = parselist rest in
(Pair (sexpr1, sexpr2), rest')
You can define check_output like this:
let check_output test =
let (sexpr, toks) = parse test in
if toks <> [] then
Printf.printf "(extra tokens in input)\n";
print_endline (ast_as_str sexpr)
Here's what I see for your two test cases:
# check_output ["("; "a"; "b"; ")"];;
(a (b nil))
- : unit = ()
# check_output ["("; "("; "w"; "x"; ")"; "("; "y"; "z"; ")"; ")"];;
((w (x nil)) ((y (z nil)) nil))
- : unit = ()
I think these are the right results.
I'm programming the precedence climbing algorithm in Haskell, but for a reason unknown to me, does not work. I think that Parsec state info is lost at some point, but I don't even know that is the source of the error:
module PrecedenceClimbing where
import Text.Parsec
import Text.Parsec.Char
{-
Algorithm
compute_expr(min_prec):
result = compute_atom()
while cur token is a binary operator with precedence >= min_prec:
prec, assoc = precedence and associativity of current token
if assoc is left:
next_min_prec = prec + 1
else:
next_min_prec = prec
rhs = compute_expr(next_min_prec)
result = compute operator(result, rhs)
return result
-}
type Precedence = Int
data Associativity = LeftAssoc
| RightAssoc
deriving (Eq, Show)
data OperatorInfo = OPInfo Precedence Associativity (Int -> Int -> Int)
mkOperator :: Char -> OperatorInfo
mkOperator = \c -> case c of
'+' -> OPInfo 1 LeftAssoc (+)
'-' -> OPInfo 1 LeftAssoc (-)
'*' -> OPInfo 2 LeftAssoc (*)
'/' -> OPInfo 2 LeftAssoc div
'^' -> OPInfo 3 RightAssoc (^)
getPrecedence :: OperatorInfo -> Precedence
getPrecedence (OPInfo prec _ _) = prec
getAssoc :: OperatorInfo -> Associativity
getAssoc (OPInfo _ assoc _) = assoc
getFun :: OperatorInfo -> (Int -> Int -> Int)
getFun (OPInfo _ _ fun) = fun
number :: Parsec String () Int
number = do
spaces
fmap read $ many1 digit
operator :: Parsec String () OperatorInfo
operator = do
spaces
fmap mkOperator $ oneOf "+-*/^"
computeAtom = do
spaces
number
loop minPrec res = (do
oper <- operator
let prec = getPrecedence oper
if prec >= minPrec
then do
let assoc = getAssoc oper
next_min_prec = if assoc == LeftAssoc
then prec + 1
else prec
rhs <- computeExpr(next_min_prec)
loop minPrec $ getFun oper res rhs
else return res) <|> (return res)
computeExpr :: Int -> Parsec String () Int
computeExpr minPrec = (do
result <- computeAtom
loop minPrec result) <|> (computeAtom)
getResult minPrec = parse (computeExpr minPrec) ""
My program for some reason is only processing the first operation or the first operand depending on the case, but does not go any further
GHCi session:
*PrecedenceClimbing> getResult 1 "46+10"
Right 56
*PrecedenceClimbing> getResult 1 "46+10+1"
Right 56
I'm not sure exactly what's wrong with your code but I'll offer these comments:
(1) These statements are not equivalent:
Generic Imperative: rhs = compute_expr(next_min_prec)
Haskell: rhs <- computeExpr(next_min_prec)
The imperative call to compute_expr will always return. The Haskell call may fail in which case the stuff following the call never happens.
(2) You are really working against Parsec's strengths by trying to parse tokens one at a time in sequence. To see the "Parsec way" of generically parsing expressions with operators of various precedences and associativities, have a look at:
buildExpression
Parsec and Expression Printing
Update
I've posted a solution to http://lpaste.net/165651
Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.
Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?
For example, the following expression should be parsed into the tree
(a + b) + c + d * e + f
Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.
This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:
module Lib where
import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
data Expr =
Identifier String
| Multiply Expr Expr
| Add Expr Expr
instance Show Expr where
show (Identifier s) = s
show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell
-- Infix parser.
infix_ operator func =
Expr.Infix (reserved operator >> return func) Expr.AssocLeft
parser =
Expr.buildExpressionParser table term <?> "expression"
where
table = [[infix_ "*" Multiply], [infix_ "+" Add]]
term =
parens parser
<|> (Identifier <$> identifier)
<?> "term"
Running this in GHCi:
λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)
There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:
data Expr' =
Identifier' String
| Add' [Expr']
| Multiply' [Expr']
deriving (Show)
collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e#(Add l r) f =
collect l f ++ collect r f
collect e#(Multiply l r) f =
collect l f ++ collect r f
isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False
isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False
optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e#(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e#(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))
I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.
I have a discriminated union for expressions like this one (EQ =; GT >; etc)
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))
I want to create instances of DU from such expressions saved in file/database.
How do i do it? If it is not feasible, what is the best way to approach it in F#?
Daniel: these expressions are saved in prefix format (as above) as text and will be parsed in F#. Thanks.
If you just want to know how to model these expressions using DUs, here's one way:
type BinaryOp =
| EQ
| GT
type Expr =
| And of Expr * Expr
| Or of Expr * Expr
| Binary of BinaryOp * Expr * Expr
| Var of string
| Value of obj
let expr =
And(
Or(
Binary(EQ, Var("X"), Value(0)),
Binary(GT, Var("X"), Value(10))),
Or(
Binary(EQ, Var("Y"), Value(0)),
Binary(GT, Var("Y"), Value(10))))
Now, this may be too "loose," i.e., it permits expressions like And(Value(1), Value(2)), which may not be valid according to your grammar. But this should give you an idea of how to approach it.
There are also some good examples in the F# Programming wikibook.
If you need to parse these expressions, I highly recommend FParsec.
Daniel's answer is good. Here's a similar approach, along with a simple top-down parser built with active patterns:
type BinOp = | And | Or
type Comparison = | Gt | Eq
type Expr =
| BinOp of BinOp * Expr * Expr
| Comp of Comparison * string * int
module private Parsing =
// recognize and strip a leading literal
let (|Lit|_|) lit (s:string) =
if s.StartsWith(lit) then Some(s.Substring lit.Length)
else None
// strip leading whitespace
let (|NoWs|) (s:string) =
s.TrimStart(' ', '\t', '\r', '\n')
// parse a binary operator
let (|BinOp|_|) = function
| Lit "AND" r -> Some(And, r)
| Lit "OR" r -> Some(Or, r)
| _ -> None
// parse a comparison operator
let (|Comparison|_|) = function
| Lit "GT" r -> Some(Gt, r)
| Lit "EQ" r -> Some(Eq, r)
| _ -> None
// parse a variable (alphabetical characters only)
let (|Var|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, "^[a-zA-Z]+")
if m.Success then
Some(m.Value, s.Substring m.Value.Length)
else
None
// parse an integer
let (|Int|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, #"^-?\d+")
if m.Success then
Some(int m.Value, s.Substring m.Value.Length)
else
None
// parse an expression
let rec (|Expr|_|) = function
| NoWs (Lit "(" (BinOp (b, Expr(e1, Expr(e2, Lit ")" rest))))) ->
Some(BinOp(b, e1, e2), rest)
| NoWs (Lit "(" (Comparison (c, NoWs (Var (v, NoWs (Int (i, Lit ")" rest))))))) ->
Some(Comp(c, v, i), rest)
| _ -> None
let parse = function
| Parsing.Expr(e, "") -> e
| s -> failwith (sprintf "Not a valid expression: %s" s)
let e = parse #"
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))"