Clang IfStmt with shortcut binary operator in condition - clang

I am trying to detect if there is a function call inside an if statement as part of condition; like following:
if (cmp(a, b)){
\\do something
}
I have found I could do this with AST matcher in following manner:
Matcher.addMatcher(ifStmt(hasCondition(callExpr().bind("call_expr")))
.bind("call_if_stmt"),&handleMatch);
But the problem is condition could have shortcuts like &&, ||; like following:
if(a != b && cmp(a,b) || c == 10){
\\ do something
}
Now this condition has binaryoperator && and ||; also have a call expression as part of it. Now how I could detect that there is a call expression inside this if statement? Definitely I don't know how many binary operator as shortcuts will be there, so I am looking for a generalize solution for this, possibly using clange AST matcher.

In the first case, if(cmp(a,b)), the CallExpr node is a direct child of the IfStmt. In the second case, it is a descendant of the IfStmt, but not a child. Instead, it is nested beneath two BinaryOperator nodes. (I found this out by looking at the AST with clang-check -ast-dump test.cpp --.) Adding a hasDescendant traversal matcher will find the more deeply nested CallExpr. Unfortunately, that alone will not find the first case. So we could use anyOf to combine it with the original matcher:
ifStmt(
hasCondition(
anyOf(
callExpr().bind("top_level_call_expr"),
hasDescendant(
callExpr().bind("nested_call_expr")
)
)
)
).bind("call_if_stmt")
If I take test.cpp to have the following code:
bool cmp(int a, int b){return a < b;}
int f(int a, int c){
int b = 42;
if( a != b && cmp(a,b) || c == 10){
return 2;
}
return c;
}
int g(int a, int c){
int b = 42;
if( cmp(a,b)) {
return 2;
}
return c;
}
then I can test this with clang-query test.cpp --:
clang-query> let m2 ifStmt( hasCondition( anyOf(callExpr().bind("top_level_call_expr"),hasDescendant(callExpr().bind("nested_call_expr"))))).bind("call_if_stmt")
clang-query> m m2
Match #1:
/path/to/test.xpp:5:7: note: "call_if_stmt" binds here
if( a != b && cmp(a,b) || c == 10){
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/test.cpp:5:21: note: "nested_call_expr" binds here
if( a != b && cmp(a,b) || c == 10){
^~~~~~~~
/path/to/test.cpp:5:7: note: "root" binds here
if( a != b && cmp(a,b) || c == 10){
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Match #2:
/path/to/test.cpp:13:7: note: "call_if_stmt" binds here
if( cmp(a,b)) {
^~~~~~~~~~~~~~~
/path/to/test.cpp:13:7: note: "root" binds here
if( cmp(a,b)) {
^~~~~~~~~~~~~~~
/path/to/test.cpp:13:11: note: "top_level_call_expr" binds here
if( cmp(a,b)) {
^~~~~~~~
2 matches.

Related

Binary tree traversal Inorder output is wrong why?

Can someone explain why my output is wrong and how to fix it?
for example: i will input A B C D E
output is giving me A B C D E
insead of Inorder Traversal: D B E A C
this is my code:
int main()
{
struct node *root = NULL;
int choice, n; // item
char item;
do
{
printf("\n1. Insert Node");
printf("\n2. Traverse in Inorder");
printf("\nEnter Choice : ");
scanf("%d",&choice);
switch(choice)
{
case 1:
root = NULL;
printf("\n\n Nodes : ");
scanf("%d",&n);
for(int i = 1; i <= n; i++)
{
printf("\nEnter data for node %d : ", i);
scanf(" %c",&item);
root = Create(root,item);
}
break;
case 2:
printf("\nBST Traversal in INORDER \n");
Inorder(root); break;
default:
printf("\n\nINVALID OPTION TRY AGAIN\n\n"); break;
}
} while(choice != 3);
}
struct node *Create(struct node *root, char item)
{
if(root == NULL)
{
root = (struct node *)malloc(sizeof(struct node));
root->left = root->right = NULL;
root->data = item;
return root;
}
else
{
if(item < root->data )
root->left = Create(root->left,item);
else if(item > root->data )
root->right = Create(root->right,item);
else
printf(" Duplicate Element !! Not Allowed !!!");
return(root);
}
}
void Inorder(struct node *root)
{
if( root != NULL)
{
Inorder(root->left);
printf(" %c ",root->data);
Inorder(root->right);
}
}
i doubled check the algorithm of The traversal Inorder but my output is still wrong i don't understand why? did i miss something here
The result is as expected. The in-order traversal should not produce D B E A C for your input of A B C D E
This is how the tree is constructed.
First the root is created with value A
Then B is inserted. As B > A, it is inserted as a right child of the root:
A
\
B
Then B is inserted. As C > A, it is inserted in the right subtree. There again we find C > B, so the new node will be inserted as a right child of B:
A
\
B
\
C
In the same way D and then E are inserted, giving this tree:
A
\
B
\
C
\
D
\
E
Note that this tree is not balanced at all. That's what happens when you insert nodes in their lexical order. If you would insert them in a more random order, we would expect the tree to be more balanced.
But it does not actually matter for the in-order traversal. What you have implemented is a binary search tree (BST). And one important property of BSTs is that their in-order traversal always produces the data in their right order. And so irrespective of the order in which you input the letters A B C D and E, the in-order traversal should always output this sequence:
A B C D E
This is correct.

Abstract Syntax Tree for Source Code including Expressions

I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}
// I know this might look ugly that I use a global variable ret to return parsed subtrees
// but please bear with it, I got used to this for various performance/usability reasons
var ret, tokens
function get_precedence(op) {
// this is an essential part, cannot parse an expression without the precedence checker
if (op == '*' || op == '/' || op == '%') return 14
if (op == '+' || op == '-') return 13
if (op == '<=' || op == '>=' || op == '<' || op == '>') return 11
if (op == '==' || op == '!=') return 10
if (op == '^') return 8
if (op == '&&') return 6
if (op == '||') return 5
return 0
}
function parse_primary(pos) {
// in the real language primary is almost everything that can be on the sides of +
// but here we only handle numbers detected with the JavaScript 'typeof' keyword
if (typeof tokens[pos] == 'number') {
ret = {
type: 'number',
value: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_operator(pos) {
// let's just reuse the function we already wrote insted of creating another huge 'if'
if (get_precedence(tokens[pos]) != 0) {
ret = {
type: 'operator',
operator: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_expr(pos) {
var stack = [], code = [], n, op, next, precedence
pos = parse_primary(pos)
if (pos == undefined) {
// error, an expression can only start with a primary
return undefined
}
stack.push(ret)
while (true) {
n = pos
pos = parse_operator(pos)
if (pos == undefined) break
op = ret
pos = parse_primary(pos)
if (pos == undefined) break
next = ret
precedence = get_precedence(op.operator)
while (stack.length > 0 && get_precedence(stack[stack.length - 1].operator) >= precedence) {
code.push(stack.pop())
}
stack.push(op)
code.push(next)
}
while(stack.length > 0) {
code.push(stack.pop())
}
if (code.length == 1) ret = code[0]
else ret = {
type: 'expr',
stack: code,
}
return n
}
function main() {
tokens = [1, '+', 2, '*', 3]
var pos = parse_expr(0)
if (pos) {
console.log('parsed expression AST')
console.log(ret)
}
else {
console.log('unable to parse anything')
}
}
main()
Here is your bare-bones implementation of shunting yard expression parsing. This is written in JavaScript. This is as minimalistic and simple as you can get. Tokenizing is left off for brevity, you give the parse the array of tokens (you call them lexemes).
The actual Shunting Yard is the parse_expr function. This is the "classic" implementation that uses the stack, this is my preference, some people prefer functional recursion.
Functions that parse various syntax elements are usually called "parselets". here we have three of them, one for expression, others are for primary and operator. If a parselet detects the corresponding syntax construction at the position pos it will return the next position right after the construct, and the construct itself in AST form is returned via the global variable ret. If the parselet does not find what it expects it returns undefined.
It is now trivially simple to add support for parens grouping (, just extend parse_primary with if (parse_group())... else if (parse_number())... etc. In the meantime your parse_primary will grow real big supporting various things, prefix operators, function calls, etc.

Uncaught Invalid argument: Instance of 'Node'

When working with a sort function, the implemented node i had created seems to cause some issues.
I have tracked it down to the comparison of Nodes in the Merge function of MergeSort. That being said, the line of code in question is:
if (_tmpArray[i] <= _tmpArray[j])
_tmpArray is defined in the the constructor, but given content value in the merge
Node implementation of operator ==, operator <, operator <= are as follows.
bool operator ==( Node<T> other) => identical(this, other);
bool operator <( Node<T> other){
//other is of same type, T.
if (_value.compareTo(other) == -1){
return true;
}
return false;
}
bool operator <= ( Node<T> other){
return (this == other) || (this < other);
}
It seems that maybe my implementation is wrong. I am doing a test inside of main with a List of size 400, of T = int.
Attached is my Dartpad file: https://dartpad.dartlang.org/612422345f1ac8a27f8e
It seems that the comparison of: _value.compareTo is not correct because T doesnt have compareTo in this case of int being T. When converting the int to "String" which is comparable though compareTo it still shows the same error.
//other is of same type, T.
if (_value.compareTo(other._value) == -1){
// ^^^^^^^ was missing
return true;
}
return false;

How can I use either < or > (or other comparative operator) in an expression depending on a function input?

I have two longish blocks of code that are identical except in various comparative statements > is switched with <, >= with <= etc. I wanted to put these in a function and use one operator or another depending on a function input.
I am coding in MQL5 but this is very similar to C++ so hopefully methods that work in this will also be useable in my case.
You can create a comparator function for each comparison you need, and then pass the right function as an argument to the longish code blocks (wrapped in a suitably defined function)
As an example, consider the following hypothetical case where a function (myFunc) receives 2 integers(a and b)
and processes them. The processing steps are similar except for the type of comparison done on the arguments. We get around the problem by providing myFunc with the right tool for comparison.
#include <iostream>
using namespace std;
bool comp1(int a, int b) {
return a > b;
}
bool comp2(int a, int b) {
return a < b;
}
void myFunc(int a, int b, bool (*myComp)(int, int)) {
bool res = myComp(a, b);
cout << "value : " << res << endl;
}
int main()
{
myFunc(1, 2, comp1); //use >
myFunc(1, 2, comp2); //use <
return 0;
}
Clearly, comp1 and comp2 are the 2 different comparators. We pass one of them to myFunc depending on the requirements (< or >).
The best thing is that your comparisons can now be as complex as you want, and myFunc is oblivious to the complexities.
Coding in MQL4 you haven't pointers to function / templates. MQL5 has templates but formal parameter types are only built-in or basic user-defined types.
You could try something like:
enum COMPARATOR
{
C_EQUAL = 0,
C_LESS = 1,
C_GREATER = -1
C_AT_MOST = 2,
C_AT_LEAST = -2,
};
bool cmp(int a, int b, COMPARATOR c)
{
switch (c)
{
case C_LESS: return a < b;
case C_AT_MOST: return a <= b;
case C_EQUAL: return a == b;
case C_AT_LEAST: return a >= b;
case C_GREATER: return a > b;
}
Alert("INTERNAL ERROR: UNKNOWN COMPARISON");
return false;
}
void a_function(COMPARATOR c)
{
if (cmp(MathRand(), 13, c))
Print("BOOM");
// *** If you need the "opposite" of c *** you can write:
if (cmp(Time[0], Time[1], COMPARATOR(-c))
Alert("DONE");
}
It isn't elegant but it's effective.
Pass in a "comparator" as a function or functor, in this case I'm using the std::less and std::greater functors defined in the functional header, there are functors defined for more or less all the operators.
#include <iostream>
#include <functional>
template<typename Comparator>
void do_something(Comparator comp)
{
int a = 1;
int b = 2;
if (comp(a, b)) {
std::cout << "expression was true" << std::endl;
} else {
std::cout << "expression was not true" << std::endl;
}
}
int main(int argc, char* argv[])
{
do_something(std::greater<int>());
do_something(std::less<int>());
}
Output:
expression was not true
expression was true

Use of context in C++ API

I have the following program , which transforms a string into a Boolean formula (string_to_formula), where I am defining expr_vector b(c). This code works, but I am not being able to reason about the context. What is the function of a context? Is there any way we can define the variable b just once? Why do we need to send the context to the function? And can this code be written in a more succinct way?
int main() { try {
context c;
expr form(c);
form = string_to_formula("x1x00xx011",c);
expr form1(c);
form1 = string_to_formula("1100x1x0",c);
solver s(c);
s.add(form && form1);
s.check();
model m = s.get_model();
cout << m << "\n";
}
expr string_to_formula(string str, context& c )
{
expr_vector b(c) ;
for ( unsigned i = 0; i < str.length(); i++)
{ stringstream b_name;
b_name << "b_" << i;
b.push_back(c.bool_const(b_name.str().c_str()));
}
expr formula(c);
formula = c.bool_val(true);
for( unsigned i = 0 ; i < str.length() ; ++i )
{ char element = str.at(i) ;
if ( element == '1' )
formula = formula && ( b[i] == c.bool_val(true) ) ;
else if ( element == '0' )
formula = formula && ( b[i] == c.bool_val(false) ) ;
else if ( element == 'x' )
continue;
}
return formula;
}
The context object is relevant for multi-threaded programs.
Each execution thread can have its own context, and they can be accessed without using any form of synchronization (e.g., mutexes).
Each expression belongs to a single context. We cannot use the same expression in two different contexts, but we can copy them from one context to another.
In Z3, expressions are maximally shared. For example, if we have an expressions such as (f T T) where T is a big term, then internally Z3 has only one copy of T. For implementing this feature, we use a hashtable. The hashtable is stored in the context.
If we use the same context C in two different execution threads, Z3 will probably crash due to race conditions updating C.
If your program has only one execution thread, you can avoid "moving" the context around by having a global variable.
The idea of context/manager is present in many libraries. For example, in CUDD (BDD library), they have a DdManager. In the script language Lua, they have a lua_State. These are all instances of the same idea.

Resources