How does a simple calculator with parentheses work? - parsing

I want to learn how calculators work. For example, say we have inputs in infix notation like this:
1 + 2 x 10 - 2
The parser would have to respect common rules in math. In the above example this means:
1 + (2 x 10) - 2 = 19 (rather than 3 x 10 - 2 = 28)
And then consider this:
1 + 2 x ((2 / 9) + 7) - 2
Does it involve an Abstract Syntax Tree? A binary tree? How is the order of operations ensured to be mathematically correct? Must I use the shunting-yard algorithm to convert this to postfix notation? And then, how would I parse it in postfix notation? Why convert in the first place?
Is there a tutorial which shows how these relatively simple calculators are built? Or can someone explain?

One way to do evaluate an expression is with a recursive descent parser.
http://en.wikipedia.org/wiki/Recursive_descent_parser
Here's an example grammar in BNF form:
http://en.wikipedia.org/wiki/Backus-Naur_form
Expr ::= Term ('+' Term | '-' Term)*
Term ::= Factor ('*' Factor | '/' Factor)*
Factor ::= ['-'] (Number | '(' Expr ')')
Number ::= Digit+
Here * means the preceding element is repeated zero or more times, + means one or more repeats, square brackets means optional.
The grammar ensures that the elements of highest precedence are collected together first, or in this case, evaluated first.
As you visit each node in the grammar, instead of building an abstract syntax tree, you evaluate the current node and return the value.
Example code (not perfect but should give you an idea of how to map BNF to code):
def parse_expr():
term = parse_term()
while 1:
if match('+'):
term = term + parse_term()
elif match('-'):
term = term - parse_term()
else: return term
def parse_term():
factor = parse_factor()
while 1:
if match('*'):
factor = factor * parse_factor()
elif match('/'):
factor = factor / parse_factor()
else: return factor
def parse_factor():
if match('-'):
negate = -1
else: negate = 1
if peek_digit():
return negate * parse_number()
if match('('):
expr = parse_expr()
if not match(')'): error...
return negate * expr
error...
def parse_number():
num = 0
while peek_digit():
num = num * 10 + read_digit()
return num
To show how your example of 1 + 2 * 10 - 2 would evaluate:
call parse_expr stream is 1 + 2 * 10 - 2
call parse term
call parse factor
call parse number which returns 1 stream is now + 2 * 10 - 2
match '+' stream is now 2 * 10 - 2
call parse factor
call parse number which returns 2 stream is now * 10 - 2
match '*' stream is now 10 - 2
call parse number which returns 10 stream is now - 2
computes 2 * 10, return 20
compute 1 + 20 -> 21
match '-' stream is now 2
call parse factor
call parse number which returns 2 stream is empty
compute 21 - 2, return 19
return 19

Try looking at Antlr. It is what I used to build a custom compiler/parser... and could easily relate to a calculator which would be a very simple thing to create.

Related

How to write grammar for an expression when it can have many possible forms

I have some sentences that I need to convert to regex code and I was trying to use Pyparsing for it. The sentences are basically search rules, telling us what to search for.
Examples of sentences -
LINE_CONTAINS this is a phrase
-this is an example search rule telling that the line you are searching on should have the phrase this is a phrase
LINE_STARTSWITH However we - this is an example search rule telling that the line you are searching on should start with the phrase However we
The rules can be combined too, like- LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH However we
A list of all actual sentences (if necessary) can be found here.
All lines start with either of the 2 symbols mentioned above (call them line_directives). Now, I am trying to parse these sentences and then convert them to regex code. I started writing a BNF for my grammar and this is what I came up with -
lpar ::= '{'
rpar ::= '}'
line_directive ::= LINE_CONTAINS | LINE_STARTSWITH
phrase ::= lpar(?) + (word+) + rpar(?) # meaning if a phrase is parenthesized, its still the same
upto_N_words ::= lpar + 'UPTO' + num + 'WORDS' + rpar
N_words ::= lpar + num + 'WORDS' + rpar
upto_N_characters ::= lpar + 'UPTO' + num + 'CHARACTERS' + rpar
N_characters ::= lpar + num + 'CHARACTERS' + rpar
JOIN_phrase ::= phrase + JOIN + phrase
AND_phrase ::= phrase (+ JOIN + phrase)+
OR_phrase ::= phrase (+ OR + phrase)+
BEFORE_phrase ::= phrase (+ BEFORE + phrase)+
AFTER_phrase ::= phrase (+ AFTER + phrase)+
braced_OR_phrase ::= lpar + OR_phrase + rpar
braced_AND_phrase ::= lpar + AND_phrase + rpar
braced_BEFORE_phrase ::= lpar + BEFORE_phrase + rpar
braced_AFTER_phrase ::= lpar + AFTER_phrase + rpar
braced_JOIN_phrase ::= lpar + JOIN_phrase + rpar
rule ::= line_directive + subrule
final_expr ::= rule (+ AND/OR + rule)+
The problem is the subrule, for which (based on the empirical data I have) I have been able to come up with all of the following expressions -
subrule ::= phrase
::= OR_phrase
::= JOIN_phrase
::= BEFORE_phrase
::= AFTER_phrase
::= AND_phrase
::= phrase + upto_N_words + phrase
::= braced_OR_phrase + phrase
::= phrase + braced_OR_phrase
::= phrase + braced_OR_phrase + phrase
::= phrase + upto_N_words + braced_OR_phrase
::= phrase + upto_N_characters + phrase
::= braced_OR_phrase + phrase + upto_N_words + phrase
::= phrase + braced_OR_phrase + upto_N_words + phrase
To give an example, one sentence I have is LINE_CONTAINS the objective of this study was {to identify OR identifying} genes upregulated. For this the subrule as mentioned above is phrase + braced_OR_phrase + phrase.
So my question is how do I write a simple BNF grammar expression for the subrule so that I would be able to easily code the grammar for it using Pyparsing? Also, any input regarding my present technique is absolutely welcome.
EDIT: After applying the principles elucidated by #Paul in his answer, here is the MCVE version of the code. It takes a list of sentences to be parsed hrrsents, parses each sentence, converts it to it's corresponding regex and returns a list of regex strings -
from pyparsing import *
import re
def parse_hrr(hrrsents):
UPTO, AND, OR, WORDS, CHARACTERS = map(Literal, "UPTO AND OR WORDS CHARACTERS".split())
LBRACE,RBRACE = map(Suppress, "{}")
integer = pyparsing_common.integer()
LINE_CONTAINS, PARA_STARTSWITH, LINE_ENDSWITH = map(Literal,
"""LINE_CONTAINS PARA_STARTSWITH LINE_ENDSWITH""".split()) # put option for LINE_ENDSWITH. Users may use, I don't presently
BEFORE, AFTER, JOIN = map(Literal, "BEFORE AFTER JOIN".split())
keyword = UPTO | WORDS | AND | OR | BEFORE | AFTER | JOIN | LINE_CONTAINS | PARA_STARTSWITH
class Node(object):
def __init__(self, tokens):
self.tokens = tokens
def generate(self):
pass
class LiteralNode(Node):
def generate(self):
return "(%s)" %(re.escape(''.join(self.tokens[0]))) # here, merged the elements, so that re.escape does not have to do an escape for the entire list
class ConsecutivePhrases(Node):
def generate(self):
join_these=[]
tokens = self.tokens[0]
for t in tokens:
tg = t.generate()
join_these.append(tg)
seq = []
for word in join_these[:-1]:
if (r"(([\w]+\s*)" in word) or (r"((\w){0," in word): #or if the first part of the regex in word:
seq.append(word + "")
else:
seq.append(word + "\s+")
seq.append(join_these[-1])
result = "".join(seq)
return result
class AndNode(Node):
def generate(self):
tokens = self.tokens[0]
join_these=[]
for t in tokens[::2]:
tg = t.generate()
tg_mod = tg[0]+r'?=.*\b'+tg[1:][:-1]+r'\b)' # to place the regex commands at the right place
join_these.append(tg_mod)
joined = ''.join(ele for ele in join_these)
full = '('+ joined+')'
return full
class OrNode(Node):
def generate(self):
tokens = self.tokens[0]
joined = '|'.join(t.generate() for t in tokens[::2])
full = '('+ joined+')'
return full
class LineTermNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
dir_phr_map = {
'LINE_CONTAINS': lambda a: r"((?:(?<=^)|(?<=[\W_]))" + a + r"(?=[\W_]|$))456",
'PARA_STARTSWITH':
lambda a: ( r"(^" + a + r"(?=[\W_]|$))457") if 'gene' in repr(a)
else (r"(^" + a + r"(?=[\W_]|$))458")}
for line_dir, phr_term in zip(tokens[0::2], tokens[1::2]):
ret = dir_phr_map[line_dir](phr_term.generate())
return ret
class LineAndNode(Node):
def generate(self):
tokens = self.tokens[0]
return '&&&'.join(t.generate() for t in tokens[::2])
class LineOrNode(Node):
def generate(self):
tokens = self.tokens[0]
return '###'.join(t.generate() for t in tokens[::2])
class UpToWordsNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
word_re = r"([\w]+\s*)"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "(%s{0,%d})" % (word_re, op)
return ret
class UpToCharactersNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
char_re = r"\w"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "((%s){0,%d})" % (char_re, op)
return ret
class BeforeAfterJoinNode(Node):
def generate(self):
tokens = self.tokens[0]
operator_opn_map = {'BEFORE': lambda a,b: a + '.*?' + b, 'AFTER': lambda a,b: b + '.*?' + a, 'JOIN': lambda a,b: a + '[- ]?' + b}
ret = tokens[0].generate()
for operator, operand in zip(tokens[1::2], tokens[2::2]):
ret = operator_opn_map[operator](ret, operand.generate()) # this is basically calling a dict element, and every such element requires 2 variables (a&b), so providing them as ret and op.generate
return ret
## THE GRAMMAR
word = ~keyword + Word(alphas, alphanums+'-_+/()')
uptowords_expr = Group(LBRACE + UPTO + integer("numberofwords") + WORDS + RBRACE).setParseAction(UpToWordsNode)
uptochars_expr = Group(LBRACE + UPTO + integer("numberofchars") + CHARACTERS + RBRACE).setParseAction(UpToCharactersNode)
some_words = OneOrMore(word).setParseAction(' '.join, LiteralNode)
phrase_item = some_words | uptowords_expr | uptochars_expr
phrase_expr = infixNotation(phrase_item,
[
((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT, BeforeAfterJoinNode), # was not working earlier, because BEFORE etc. were not keywords, and hence parsed as words
(None, 2, opAssoc.LEFT, ConsecutivePhrases),
(AND, 2, opAssoc.LEFT, AndNode),
(OR, 2, opAssoc.LEFT, OrNode),
],
lpar=Suppress('{'), rpar=Suppress('}')
) # structure of a single phrase with its operators
line_term = Group((LINE_CONTAINS|PARA_STARTSWITH)("line_directive") +
(phrase_expr)("phrases")) # basically giving structure to a single sub-rule having line-term and phrase
#
line_contents_expr = infixNotation(line_term.setParseAction(LineTermNode),
[(AND, 2, opAssoc.LEFT, LineAndNode),
(OR, 2, opAssoc.LEFT, LineOrNode),
]
) # grammar for the entire rule/sentence
######################################
mrrlist=[]
for t in hrrsents:
t = t.strip()
if not t:
continue
try:
parsed = line_contents_expr.parseString(t)
except ParseException as pe:
print(' '*pe.loc + '^')
print(pe)
continue
temp_regex = parsed[0].generate()
final_regexes3 = re.sub(r'gene','%s',temp_regex) # this can be made more precise by putting a condition of [non-word/^/$] around the 'gene'
mrrlist.append(final_regexes3)
return(mrrlist)
You have a two-tiered grammar here, so you would do best to focus on one tier at a time, which we have covered in some of your other questions. The lower tier is that of the phrase_expr, which will later serve as the argument to the line_directive_expr. So define examples of phrase expressions first - extract them from your list of complete statement samples. Your finished BNF for phrase_expr will have the lowest level of recursion look like:
phrase_atom ::= <one or more types of terminal items, like words of characters
or quoted strings, or *possibly* expressions of numbers of
words or characters> | brace + phrase_expr + brace`
(Some other questions: Is it possible to have multiple phrase_items one after another with no operator? What does that indicate? How should it be parsed? interpreted? Should this implied operation be its own level of precedence?)
That will be sufficient to loop back the recursion for your phrase expression - you should not need any other braced_xxx element in your BNF. AND, OR, and JOIN are clearly binary operators - in normal operation precedence, AND's are evaluated before OR's, you can decide for yourself where JOIN should fall in this. Write some sample phrases with no parentheses, with AND and JOIN, and OR and JOIN, and think through what order of evaluation makes sense in your domain.
Once that is done, then line_directive_expr should be simple, since it is just:
line_directive_item ::= line_directive phrase_expr | brace line_directive_expr brace
line_directive_and ::= line_directive_item (AND line_directive_item)*
line_directive_or ::= line_directive_and (OR line_directive_and)*
line_directive_expr ::= line_directive_or
Then when you translate to pyparsing, add Groups and results names a little at a time! Don't immediately Group everything or name everything. Ordinarily I recommend using results names liberally, but in infix notation grammars, lots of results names can just clutter up the results. Let the Group (and ultimately node classes) do the structuring, and the behavior in the node classes will guide you where you want results names. For that matter, the results classes usually get such a simple structure that it is often easier just to do list unpacking in the class init or evaluate methods. Start with simple expressions and work up to complicated ones. (Look at "LINE_STARTSWITH gene" - it is one of your simplest test cases, but you have it as #97?) If you just sort this list by length order, that would be a good rough cut. Or sort by increasing number of operators. But tackling the complex cases before you have the simple ones working, you will have too many options on where a tweak or refinement should go, and (speaking from personal experience) you are as likely to get it wrong as get it right - except when you get it wrong, it just makes fixing the next issue more difficult.
And again, as we have discussed elsewhere, the devil in this second tier is doing the actual interpretation of the various line directive items, since there is an implied order to evaluating LINE_STARTSWITH vs LINE_CONTAINS that overrides the order that they may be found in the initial string. That ball is entirely in your court, since you are the language designer for this particular domain.

Expression trees parsing and execution of operations

I got this problem in a coding challenge. I couldn't solve it on time but I still want to know how could it be done. I am not very familiar with expression trees and I found it hard to model the problem. The description goes like this:
Input: expression_tree | sequence_of_operations
The input is a single line of text with a expression tree and a sequence of operations separated by | character and ended by a \n newline character. Spaces are allowed in the input but should be ignored.
The expression tree is a sequence of 1-character variables A-Z and with sub expression trees formed by parenthesis (expression_tree). Examples: AB, A(B C D), (AB)C((DE)F)
The sequence of operations is a string of with characters R (reverse) or S (simplify)
Reverse means reverse the order of everything in expression tree. Applying reverse twice in a row cancels out.
Example: (AB)C((DE)F) | R should print (F(ED))C(BA)
Simplify means remove the parentheses around the very first element in the expression tree and each of its subexpression trees. Applying S multiple times should have same result as applying S once.
Example: (AB)C((DE)F) | S should print ABC(DEF)
Output: Read the expression tree and apply the sequence of operations from left to right to the expression tree, print out the result without characters.
What I would like to know the most is how to model the expression tree to handle the parentheses and how does the simplify operation should work?
'''
Examples are as follows :
INPUT:
(AB)(C(DE))/SS
(((AB)C)D)E/SS
(((AB)C)D)E/SR
(AB)C((DE)F)/SRS
(AB(CD((EF)G)H))(IJ)/SRS
(AB(CD((EF)G)H))(IJ)/SSSRRRS
(A(B(C)D)E(FGH(IJ)))/SRRSSRSSRRRSSRS
(A(BC(D(E((Z)K(L)))F))GH(IJK))/S
-------------------------------
OUTPUT:
AB(C(DE))
ABCDE
EDCBA
FEDCBA
JI(H(GFE)DC)BA
JI(H(GFE)DC)BA
JIHFGE(D(C)B)A
A(BC(D(E(ZK(L)))F))GH(IJK)/S
'''
'''
operationReverse function returns a reversed expression tree
Example : AB(CD) -> (DC)BA
'''
def operationReverse(expression):
'''============== Reversing the whole expressions ================'''
expression = expression[::-1]
expression = list(expression)
'''========= Replace Closing brace with Opening brace and vice versa ========='''
for x in range(0, len(expression)):
if(expression[x] != ')' and expression[x] != '('):
continue
elif(expression[x] == ")"):
expression[x] = "("
else:
expression[x] = ")"
expression = ''.join(expression)
return expression
'''
operationSimplify function returns a simplified expression tree
Example : (AB)(C(DE)) -> AB(C(DE))
operationSimplify uses recursion
'''
def operationSimplify(expression):
'''========= If no parenthesis found then return the expression as it is because it is already simplified ========='''
'''========= This is also the base condition to stop recursion ============='''
if(expression.find('(')==-1):
return expression
'''If 1st character is opening brace then find its correspoinding closing brace and remove them and call the function by passing the values between the opening and closing brace'''
if(expression[0] == '('):
x = 1
#numOfOpeningBrackets = maintains the count of opening brackets for finding it's corresponding closing bracket
numOfOpeningBrackets = 1
while(x < len(expression)):
if(expression[x] != ')' and expression[x] != '('):
x = x + 1
continue
elif(expression[x] == "("):
numOfOpeningBrackets = numOfOpeningBrackets + 1
x = x + 1
else:
numOfOpeningBrackets = numOfOpeningBrackets - 1
if(numOfOpeningBrackets == 0):
posOfCloseBracket = x
break
x = x + 1
expression = operationSimplify(expression[1:posOfCloseBracket]) + expression[posOfCloseBracket+1:]
'''========= If no parenthesis found then return the expression as it is because it is already simplified ========='''
if(expression.find('(')==-1):
return expression
'''========= Find the opening brace and it's closing brace and new expression tree will be concatenation of start of string till opening brace including the brace and string with in the opening brace and closing brace passed as an argument to the function itself and the remaining string ========='''
x = 0
#numOfOpeningBrackets = maintains the count of opening brackets for finding it's corresponding closing bracket
recursion = False
numOfOpeningBrackets = 0
while (x < len(expression)):
if(expression[x] != ')' and expression[x] != '('):
x = x + 1
elif(expression[x] == "("):
if(numOfOpeningBrackets == 0 or recursion == True):
numOfOpeningBrackets = 0
recursion = False
posOfStartBracket = x
y = x
numOfOpeningBrackets = numOfOpeningBrackets + 1
x = x + 1
else:
numOfOpeningBrackets = numOfOpeningBrackets - 1
if(numOfOpeningBrackets == 0):
posOfCloseBracket = x
x = y
expression=expression[0:posOfStartBracket+1]+operationSimplify(expression[posOfStartBracket+1:posOfCloseBracket])+expression[posOfCloseBracket:]
recursion = True
x = x + 1
return expression
'''
solution fucntion prints the final result
'''
def solution(inputString):
'''========= Remove the spaces from the input ==============='''
#inputString = inputString.replace("\n","")
inputString = inputString.replace(" ","")
inputString = inputString.replace("\t","")
#inputString = inputString.replace("()","")
'''=============== The substring before '/' is expression tree and substring after '/' is sequence of operations ======================'''
#posOfSlash = Position Of Slash Character
posOfSlash = inputString.find('/')
if(posOfSlash == -1):
print (inputString)
return
#expressionTree = Expression Tree
expressionTree = inputString[0:posOfSlash]
#seqOfOp = sequence of operations to be performed
seqOfOp = inputString[posOfSlash+1:]
'''============ If sequence Of Operations is empty then print the expression tree as it is ============== '''
if(len(seqOfOp)==0):
print(expressionTree)
return
'''============= Removing all the pairs of RR from the sequence Of Operations =================='''
seqOfOp = seqOfOp.replace(r+r,'')
'''============ All mulptiple S are replaced by one single S ================'''
while(seqOfOp.find(s+s) != -1):
seqOfOp = seqOfOp.replace(s+s,s)
'''============ If to perform operation R then call operationReverse() else if to perform operation S call operationSimplify() ================'''
for x in range (0 , len(seqOfOp)):
if(seqOfOp[x] == r):
expressionTree = operationReverse(expressionTree)
else :
expressionTree = operationSimplify(expressionTree)
print(expressionTree)
return
'''======= Global variables r and s representing operations R and S'''
r = 'R'
s = 'S'
while True:
try:
inputString = input()
'''==================== Calling function solution ======================'''
solution(inputString)
except EOFError:
break

Recursive descent parsing: high precedence unary operators

I've figured out how to implement binary operators with precedence, like this (pseudocode):
method plus
times()
while(consume(plus_t)) do
times()
end
end
method times
number()
while(consume(times_t))
number()
end
end
// plus() is the root operation
// omitted: number() consumes a number token
So when I parse 4 + 5 * 6 it would:
plus
multiply
number (4 consumed)
plus_t consumed
multiply
number (5 consumed)
times_t consumed
number (6 consumed)
However, when I try adding a minus method (prefix minusing like -4, not infix minusing like 4 - 5):
method minus
consume(minus_t)
plus()
end
It takes a very low precedence, so -4 + 5 becomes -(4 + 5) rather than (-4) + 5 and this is undesirable.
What can I do to make a high precedence unary operator?
You've not said where in the hierarchy you're adding the minus method, but it looks like you're adding it above plus and making it the root.
You need to put it at last if you want unary - to have a higher precedence than + and *.
In your pseudocode, something like this should work:
method times
minus()
while(consume(times_t))
minus()
end
end
method minus
if(consume(minus_t))
// next number should have a unary minus attached
number()
else
number()
end
end
I'm learning about parsers these days, so I wrote a complete parser based on your pseudocode, it's in LiveScript, but should be easy to follow.
Edit: Running example on jsfiddle.net - http://jsfiddle.net/Dogbert/7Pmwc/
parse = (string) ->
index = 0
is-digit = (d) -> '0' <= d <= '9'
plus = ->
str = times()
while consume "+"
str = "(+ #{str} #{times()})"
str
times = ->
str = unary-minus()
while consume "*"
str = "(* #{str} #{unary-minus()})"
str
unary-minus = ->
if consume "-"
"(- #{number()})"
else
number()
number = ->
if is-digit peek()
ret = peek()
advance()
while is-digit peek()
ret += peek()
advance()
ret
else
throw "expected number at index = #{index}, got #{peek()}"
peek = ->
string[index]
advance = ->
index++
consume = (what) ->
if peek() == what
advance()
true
plus()
console.log parse "4+5*6"
console.log parse "-4+5"
console.log parse "-4*-5+-4"
Output:
(+ 4 (* 5 6))
(+ (- 4) 5)
(+ (* (- 4) (- 5)) (- 4))
PS: you may want to look at Operator-precedence Parsers for parsing complex precedence/associativity relatively easily.

Parsing percent expressions with antlr4

I'm trying to parse algebraic expressions with ANTLR4. One feature I tried to accomplish with my parser is the "intelligent" handling of percent expressions.
Edit: The goal is to make the calculation of discounts or tips in a restaurant easier. E.g. if you see an advert "30% off" you could enter "price - 30%" and get the correct result. Or in a restaurant you could enter the price of your meal plus 15% and get the sum you have to pay including a tip of 15%. But this interpretation should only occur if the expression looks like "expression1 (- or +) expression2". In all other cases the percent sign should be interpreted as usual. The Google Search box calculator behaves like that./Edit
100-30% should return 70
100-(20+10)% should also return 70
3+(100-(20+10)%) should return 73
but
5% should return 0.05
(5+5)% should return 0.10
My grammar looks like this:
expr:
e EOF
;
e:
'-'a=e
| '(' a=e ')'
| a=e op=(ADD|SUB) b=e '%'
| a=e op=(ADD|SUB) b=e
| a=e'%' //**PERCENTRULE**
| FLT
;
ADD : '+' ;
SUB : '-' ;
FLT: [0-9]+(('.'|',')[0-9]+)?;
NEWLINE:'\r'? '\n' ;
WS : [ \t\n]+ -> skip ;
For the expression 100-30% I would expect the this tree:
But I get this:
How can I get the correct tree (without deleting PERCENTRULE)?
I deleted my original grammar-based answer because I realized I had a very different idea of what kind of handling you were trying to accomplish. It sounds like you want anything in the form X op Y % to become X * (1 op (Y/100)) instead. Is that accurate?
One feature I tried to accomplish with my parser is the "intelligent" handling of percent expressions:
Are you sure your specification for this is solid enough to even begin coding? It looks quite confusing to me, especially since % is more like a units-designation.
For example, I would have expected 50-30% to be either one of these:
(50 - 0.3) = 49.3
(50 - 30) / 100 = 0.20
...but what you're asking for sounds stranger still: 50 * (1 - 0.3) = 35.
That opens up additional weirdness. Wouldn't both of these be true?
0+5% would become 0 * (1 + 0.05) = 0
5% would become 5 / 100 = 0.05
This is odd because adding zero usually doesn't change what the number means.
A more-restrictive version
OK, what about allowing percentage-based changes only if the user avoids ambiguity? One way would be to create new binary operators like A -% B or A +% B, but that's not quite human-centric, so how about:
expr: e EOF ;
e
: SUB e
| parenExpr
| percentOp
| binaryOp
| FLT
;
parenExpr
: LPAREN e RPAREN
;
percentOp
: (FLT|parenExpr) (ADD|SUB) (FLT|parenExpr) PCT
;
binaryOp
: e (ADD|SUB|MUL|DIV) e
;
PCT : '%';
LPAREN : '(';
RPAREN : ')';
ADD : '+' ;
SUB : '-' ;
MUL : '*' ;
DIV : '/' ;
FLT: [0-9]+(('.'|',')[0-9]+)?;
NEWLINE:'\r'? '\n' ;
WS : [ \t\n]+ -> skip ;
This would mean:
50-5-4% is treated as 100-(5-4%) to get 45.2.
5% is not valid (on its own)
5%+4 is not valid

how to compute the number of total constraints in smtlib2 files in api

I used the Z3_ast fs = Z3_parse_smtlib2_file(ctx,arg[1],0,0,0,0,0,0) to read file.
Additionally to add into the solver utilized the expr F = to_expr(ctx,fs) and then s.add(F).
My question is how can I get the number of total constraints in each instance?
I also tried the F.num_args(), however, it is giving wrong size in some instances.
Are there any ways to compute the total constraints?
Using Goal.size() may do what you want, after you add F to some goal. Here's a link to the Python API description, I'm sure you can find the equivalent in the C/C++ API: http://research.microsoft.com/en-us/um/redmond/projects/z3/z3.html#Goal-size
An expr F represents an abstract syntax tree, so F.num_args() returns the number of (one-step) children that F has, which is probably why what you've been trying doesn't always work. For example, suppose F = a + b, then F.num_args() = 2. But also, if F = a + b*c, then F.num_args() = 2 as well, where the children would be a and b*c (assuming usual order of operations). Thus, to compute the number of constraints (in case your definition is different than what Goal.size() yields), you can use a recursive method that traverses the tree.
I've included an example below highlighting all of these (z3py link here: http://rise4fun.com/Z3Py/It5E ).
For instance, my definition of constraint (or rather the complexity of an expression in some sense) might be the number of leaves or the depth of the expression. You can get as detailed as you want with this, e.g., counting different types of operands to fit whatever your definition of constraint might be, since it's not totally clear from your question. For instance, you might define a constraint as the number of equalities and/or inequalities appearing in an expression. This would probably need to be modified to work for formulas with quantifiers, arrays, or uninterpreted functions. Also note that Z3 may simplify things automatically (e.g., 1 - 1 gets simplified to 0 in the example below).
a, b, c = Reals('a b c')
F = a + b
print F.num_args() # 2
F = a + b * c
print F.num_args() # 2
print F.children() # [a,b*c]
g = Goal()
g.add(F == 0)
print g.size() # number of constraints = 1
g.add(Or(F == 0, F == 1, F == 2, F == 3))
print g.size() # number of constraints = 2
print g
g.add(And(F == 0, F == 1, F == 2, F == 3))
print g.size() # number of constraints = 6
print g
def count_constraints(c,d,f):
print 'depth: ' + str(d) + ' expr: ' + str(f)
if f.num_args() == 0:
return c + 1
else:
d += 1
for a in f.children():
c += count_constraints(0, d, a)
return c
exp = a + b * c + a + c * c
print count_constraints(0,0,exp)
exp = And(a == b, b == c, a == 0, c == 0, b == 1 - 1)
print count_constraints(0,0,exp)
q, r, s = Bools('q r s')
exp = And(q, r, s)
print count_constraints(0,0,exp)

Resources