Recursive descent parsing: high precedence unary operators - parsing

I've figured out how to implement binary operators with precedence, like this (pseudocode):
method plus
times()
while(consume(plus_t)) do
times()
end
end
method times
number()
while(consume(times_t))
number()
end
end
// plus() is the root operation
// omitted: number() consumes a number token
So when I parse 4 + 5 * 6 it would:
plus
multiply
number (4 consumed)
plus_t consumed
multiply
number (5 consumed)
times_t consumed
number (6 consumed)
However, when I try adding a minus method (prefix minusing like -4, not infix minusing like 4 - 5):
method minus
consume(minus_t)
plus()
end
It takes a very low precedence, so -4 + 5 becomes -(4 + 5) rather than (-4) + 5 and this is undesirable.
What can I do to make a high precedence unary operator?

You've not said where in the hierarchy you're adding the minus method, but it looks like you're adding it above plus and making it the root.
You need to put it at last if you want unary - to have a higher precedence than + and *.
In your pseudocode, something like this should work:
method times
minus()
while(consume(times_t))
minus()
end
end
method minus
if(consume(minus_t))
// next number should have a unary minus attached
number()
else
number()
end
end
I'm learning about parsers these days, so I wrote a complete parser based on your pseudocode, it's in LiveScript, but should be easy to follow.
Edit: Running example on jsfiddle.net - http://jsfiddle.net/Dogbert/7Pmwc/
parse = (string) ->
index = 0
is-digit = (d) -> '0' <= d <= '9'
plus = ->
str = times()
while consume "+"
str = "(+ #{str} #{times()})"
str
times = ->
str = unary-minus()
while consume "*"
str = "(* #{str} #{unary-minus()})"
str
unary-minus = ->
if consume "-"
"(- #{number()})"
else
number()
number = ->
if is-digit peek()
ret = peek()
advance()
while is-digit peek()
ret += peek()
advance()
ret
else
throw "expected number at index = #{index}, got #{peek()}"
peek = ->
string[index]
advance = ->
index++
consume = (what) ->
if peek() == what
advance()
true
plus()
console.log parse "4+5*6"
console.log parse "-4+5"
console.log parse "-4*-5+-4"
Output:
(+ 4 (* 5 6))
(+ (- 4) 5)
(+ (* (- 4) (- 5)) (- 4))
PS: you may want to look at Operator-precedence Parsers for parsing complex precedence/associativity relatively easily.

Related

Pure pattern matching

I am building a function that counts of many times a character appears in a string after the nth position.
countCh ("aaabbbccc", 3, 'b')
val it: int = 2
In C, I would use an accumulator with a while loop. But I am trying to learn the F# functional face, where this approach is discouraged.
So I used guards to test few conditions and build the function:
let rec countCh (s:string, n:int, ch:char) =
match s, n, ch with
| (s, n, ch) when n > s.Length -> 0 //p1
| (s, n, ch) when n < 0 -> 0 //p2
| (s, n, ch) when s.[n] <> ch -> countCh(s, n + 1, ch) //p3
| (s, n, ch) when s.[n] = ch -> 1 + countCh(s, n + 1, ch) //p4
The coexistence of patterns 3 and 4 is problematic (impossible, I am afraid). Even if it compiles, I have not been able to make it work. How can this task functionally be handled?
First, the coexistence of these branches is not problematic. They don't conflict with each other. Why do you think that it's problematic? Is it because you get an "Incomplete pattern match" compiler warning? That warning does not tell you that the branches conflict, it tells you that the compiler can't prove that the four branches cover all possibilities. Or do you think that for some other reason? If you want your questions to be answered accurately, you'll have to ask them more clearly.
Second, you're abusing the pattern matching. Look: there are no patterns! The patterns in every branch are exactly the same, and trivial. Only guards are different. This looks very counterintuitively within a match, but would be plainly expressed with if..elif:
let rec countCh (s:string) n ch =
if n >= s.Length || n < 0 then 0
elif s.[n] = ch then 1 + countCh s (n + 1) ch
else countCh s (n + 1) ch
NOTE 1: see how I made the parameters curried? Always use curried form, unless there is a very strong reason to use tupled. Curried parameters are much more convenient to use on the caller side.
NOTE 2: your condition n > s.Length was incorrect: string indices go from 0 to s.Length-1, so the bail condition should be n >= s.Length. It is corrected in my code.
Finally, since this is an exercise, I must point out that the recursion is not tail recursion. Look at the second branch (in my code): it calls the function recursively and then adds one to the result. Since you have to do something with the result of the recursive call, the recursion can't be "tail". This means you risk stack overflow on very long inputs.
To make this into tail recursion, you need to turn the function "inside out", so to say. Instead of returning the result from every call, you need to pass it into every call (aka "accumulator"), and only return from the terminal case:
let rec countCh (s:string) n ch countSoFar =
if n >= s.Length || n < 0 then countSoFar
elif s.[n] = ch then countCh s (n+1) ch (countSoFar+1)
else countCh s (n+1) ch countSoFar
// Usage:
countCh "aaaabbbccc" 5 'b' 0
This way, every recursive call is the "last" call (i.e. the function doesn't do anything with the result, but passes it straight out to its own caller). This is called "tail recursion" and can be compiled to work in constant stack space (as opposed to linear).
I agree with the other answers, but I'd like to help you with your original question. You need to indent the function, and you have an off by one bug:
let rec countCh (s:string, n:int, ch:char) =
match s, n, ch with
| s, n, _ when n >= s.Length-1 -> 0 //p1
| s, _, _ when n < 0 -> 0 //p2
| s, n, ch when s.[n+1] <> ch -> countCh(s, n+2, ch) //p3
| s, n, ch when s.[n+1] = ch -> 1 + countCh(s, n+2, ch) //p4
I'd suggest to not write it yourself, but ask the library functions for help:
let countCh (s: string, n, c) =
s.Substring(n+1).ToCharArray()
|> Seq.filter ((=) c)
|> Seq.length
Or use Seq.skip, along with the fact that you can drop the conversion to character array:
let countCh (s: string, n, c) =
s
|> Seq.skip (n + 1)
|> Seq.filter ((=) c)
|> Seq.length

Confusing anonymous function construct

I'm reading through an F# tutorial, and ran into an example of syntax that I don't understand. The link to the page I'm reading is at the bottom. Here's the example from that page:
let rec quicksort2 = function
| [] -> []
| first::rest ->
let smaller,larger = List.partition ((>=) first) rest
List.concat [quicksort2 smaller; [first]; quicksort2 larger]
// test code
printfn "%A" (quicksort2 [1;5;23;18;9;1;3])
The part I don't understand is this: ((>=) first). What exactly is this? For contrast, this is an example from the MSDN documentation for List.partition:
let list1 = [ 1 .. 10 ]
let listEven, listOdd = List.partition (fun elem -> elem % 2 = 0) list1
printfn "Evens: %A\nOdds: %A" listEven listOdd
The first parameter (is this the right terminology?) to List.partition is obviously an anonymous function. I rewrote the line in question as this:
let smaller,larger = List.partition (fun e -> first >= e) rest
and it works the same as the example above. I just don't understand how this construct accomplishes the same thing: ((>=) first)
http://fsharpforfunandprofit.com/posts/fvsc-quicksort/
That's roughly the same thing as infix notation vs prefix notation
Operator are functions too and follow the same rule (ie. they can be partially applied)
So here (>=) first is the operator >= with first already applied as "first" operand, and gives back a function waiting for the second operand of the operator as you noticed when rewriting that line.
This construct combines two features: operator call with prefix notation and partial function application.
First, let's look at calling operators with prefix notation.
let x = a + b
The above code calls operator + with two arguments, a and b. Since this is a functional language, everything is a function, including operators, including operator +. It's just that operators have this funny call syntax, where you put the function between the arguments instead of in front of them. But you can still treat the operator just as any other normal function. To do that, you need to enclose it on parentheses:
let x = (+) a b // same thing as a + b.
And when I say "as any other function", I totally mean it:
let f = (+)
let x = f a b // still same thing.
Next, let's look at partial function application. Consider this function:
let f x y = x + y
We can call it and get a number in return:
let a = f 5 6 // a = 11
But we can also "almost" call it by supplying only one of two arguments:
let a = f 5 // a is a function
let b = a 6 // b = 11
The result of such "almost call" (technically called "partial application") is another function that still expects the remaining arguments.
And now, let's combine the two:
let a = (+) 5 // a is a function
let b = a 6 // b = 11
In general, one can write the following equivalency:
(+) x === fun y -> x + y
Or, similarly, for your specific case:
(>=) first === fun y -> first >= y

how to compute the number of total constraints in smtlib2 files in api

I used the Z3_ast fs = Z3_parse_smtlib2_file(ctx,arg[1],0,0,0,0,0,0) to read file.
Additionally to add into the solver utilized the expr F = to_expr(ctx,fs) and then s.add(F).
My question is how can I get the number of total constraints in each instance?
I also tried the F.num_args(), however, it is giving wrong size in some instances.
Are there any ways to compute the total constraints?
Using Goal.size() may do what you want, after you add F to some goal. Here's a link to the Python API description, I'm sure you can find the equivalent in the C/C++ API: http://research.microsoft.com/en-us/um/redmond/projects/z3/z3.html#Goal-size
An expr F represents an abstract syntax tree, so F.num_args() returns the number of (one-step) children that F has, which is probably why what you've been trying doesn't always work. For example, suppose F = a + b, then F.num_args() = 2. But also, if F = a + b*c, then F.num_args() = 2 as well, where the children would be a and b*c (assuming usual order of operations). Thus, to compute the number of constraints (in case your definition is different than what Goal.size() yields), you can use a recursive method that traverses the tree.
I've included an example below highlighting all of these (z3py link here: http://rise4fun.com/Z3Py/It5E ).
For instance, my definition of constraint (or rather the complexity of an expression in some sense) might be the number of leaves or the depth of the expression. You can get as detailed as you want with this, e.g., counting different types of operands to fit whatever your definition of constraint might be, since it's not totally clear from your question. For instance, you might define a constraint as the number of equalities and/or inequalities appearing in an expression. This would probably need to be modified to work for formulas with quantifiers, arrays, or uninterpreted functions. Also note that Z3 may simplify things automatically (e.g., 1 - 1 gets simplified to 0 in the example below).
a, b, c = Reals('a b c')
F = a + b
print F.num_args() # 2
F = a + b * c
print F.num_args() # 2
print F.children() # [a,b*c]
g = Goal()
g.add(F == 0)
print g.size() # number of constraints = 1
g.add(Or(F == 0, F == 1, F == 2, F == 3))
print g.size() # number of constraints = 2
print g
g.add(And(F == 0, F == 1, F == 2, F == 3))
print g.size() # number of constraints = 6
print g
def count_constraints(c,d,f):
print 'depth: ' + str(d) + ' expr: ' + str(f)
if f.num_args() == 0:
return c + 1
else:
d += 1
for a in f.children():
c += count_constraints(0, d, a)
return c
exp = a + b * c + a + c * c
print count_constraints(0,0,exp)
exp = And(a == b, b == c, a == 0, c == 0, b == 1 - 1)
print count_constraints(0,0,exp)
q, r, s = Bools('q r s')
exp = And(q, r, s)
print count_constraints(0,0,exp)

how to convert a large number exprimed on several bytes?

If a number is exprimed on 4 bytes, from LSB to MSB, how to convert it in integer ?
example:
<<77,0,0,0>> shall give 77
but
<<0,1,0,0>> shall give 256
Let S = <<0,1,0,0>>,
<<L1,L2,L3,L4>> = S,
L = L1*1 + L2*256 + L3*65536 + L4*16777216,
But it's not elegant ...
The bit syntax in Erlang does this in a very straightforward way:
<<A:32/little>> = <<0,1,0,0>>,
A.
% A = 256
or as a function:
decode(<<Int:32/little>>) -> Int.
% decode(<<0,1,0,0>>) =:= 256.
EDIT (this is the correct answer, and sorry for discovering it late...)
> binary:decode_unsigned(<<0,1,0,0>>,little).
256
The easier way would be something like:
decode_my_binary( <<A,B,C,D>> ) ->
A + B*256 + C*65536 + D*16777216.
EDIT:
As per your edit, if you find this one not very elegant, you can try other approaches. Still I think the above is the correct way of doing it. You can write a recursive function (not tested, but you get the idea):
decode( B ) -> decode(binary_to_list(B), 0, 1).
decode( [], R, _ ) -> R;
decode( [H|T], R, F) ->
decode(T, R + H*F, F*256).
but this is clearly slower. Another possibility is to have the list of the binary digits and the list of multipliers and then fold it:
lists:sum(lists:zipwith( fun(X,Y) -> X*Y end,
binary_to_list(B), [ math:pow(256,X) || X <- [0,1,2,3] ])).
Or if you want a variable number of digits:
fun(Digits) ->
lists:sum(lists:zipwith( fun(X,Y) -> X*Y end,
binary_to_list(B), [ math:pow(256,X) || X <- lists:seq(0,Digits-1])).
where Digits tell you the digit number.

How does a simple calculator with parentheses work?

I want to learn how calculators work. For example, say we have inputs in infix notation like this:
1 + 2 x 10 - 2
The parser would have to respect common rules in math. In the above example this means:
1 + (2 x 10) - 2 = 19 (rather than 3 x 10 - 2 = 28)
And then consider this:
1 + 2 x ((2 / 9) + 7) - 2
Does it involve an Abstract Syntax Tree? A binary tree? How is the order of operations ensured to be mathematically correct? Must I use the shunting-yard algorithm to convert this to postfix notation? And then, how would I parse it in postfix notation? Why convert in the first place?
Is there a tutorial which shows how these relatively simple calculators are built? Or can someone explain?
One way to do evaluate an expression is with a recursive descent parser.
http://en.wikipedia.org/wiki/Recursive_descent_parser
Here's an example grammar in BNF form:
http://en.wikipedia.org/wiki/Backus-Naur_form
Expr ::= Term ('+' Term | '-' Term)*
Term ::= Factor ('*' Factor | '/' Factor)*
Factor ::= ['-'] (Number | '(' Expr ')')
Number ::= Digit+
Here * means the preceding element is repeated zero or more times, + means one or more repeats, square brackets means optional.
The grammar ensures that the elements of highest precedence are collected together first, or in this case, evaluated first.
As you visit each node in the grammar, instead of building an abstract syntax tree, you evaluate the current node and return the value.
Example code (not perfect but should give you an idea of how to map BNF to code):
def parse_expr():
term = parse_term()
while 1:
if match('+'):
term = term + parse_term()
elif match('-'):
term = term - parse_term()
else: return term
def parse_term():
factor = parse_factor()
while 1:
if match('*'):
factor = factor * parse_factor()
elif match('/'):
factor = factor / parse_factor()
else: return factor
def parse_factor():
if match('-'):
negate = -1
else: negate = 1
if peek_digit():
return negate * parse_number()
if match('('):
expr = parse_expr()
if not match(')'): error...
return negate * expr
error...
def parse_number():
num = 0
while peek_digit():
num = num * 10 + read_digit()
return num
To show how your example of 1 + 2 * 10 - 2 would evaluate:
call parse_expr stream is 1 + 2 * 10 - 2
call parse term
call parse factor
call parse number which returns 1 stream is now + 2 * 10 - 2
match '+' stream is now 2 * 10 - 2
call parse factor
call parse number which returns 2 stream is now * 10 - 2
match '*' stream is now 10 - 2
call parse number which returns 10 stream is now - 2
computes 2 * 10, return 20
compute 1 + 20 -> 21
match '-' stream is now 2
call parse factor
call parse number which returns 2 stream is empty
compute 21 - 2, return 19
return 19
Try looking at Antlr. It is what I used to build a custom compiler/parser... and could easily relate to a calculator which would be a very simple thing to create.

Resources