I'm developing a plugin for the clang compiler, and would like the conditional expressions of if statements in string form. That is, given:
if (a + b + c > 10)
return;
and a reference to the IfStmt node that represents it, I would like to obtain the string "a + b + c > 10".
I suspect that isn't possible, but if anybody has any insight, it would be greatly appreciated.
Extract the condition part of the IfStmt, take its start and end location and use this to query the lexer for the underlying source code.
using namespace clang;
class IfStmtVisitor
: public RecursiveASTVisitor<IfStmtVisitor> {
SourceManager &sm; // Initialize me!
CompilerInstance &ci; // Initialize me!
bool VisitIfStmt(IfStmt *stmt) {
Expr *expr = stmt->getCond();
bool invalid;
CharSourceRange conditionRange =
CharSourceRange::getTokenRange(expr->getLocStart(), expr->getLocEnd());
StringRef str =
Lexer::getSourceText(conditionRange, sm, ci.getLangOpts(), &invalid);
if (invalid) {
return false;
}
llvm::outs() << "Condition: " << str << "\n";
return true;
}
};
Input source:
bool f(int a, int b, int c)
{
if (a + b + c > 10)
return true;
return false;
}
Output:
Condition string: a + b + c > 10
I believe you may want to try looking at the printPretty function defined in Stmt which IfStmt inherits from. That should hopefully get you close to what you want.
Related
I've got more of my expression parser working (Dart PetitParser to get at AST datastructure created with ExpressionBuilder). It appears to be generating accurate ASTs for floats, parens, power, multiply, divide, add, subtract, unary negative in front of both numbers and expressions. (The nodes are either literal strings, or an object that has a precedence with a List payload that gets walked and concatenated.)
I'm stuck now on visiting the nodes. I have clean access to the top node (thanks to Lukas), but I'm stuck on deciding whether or not to add a paren. For example, in 20+30*40, we don't need parens around 30*40, and the parse tree correctly has the node for this closer to the root so I'll hit it first during traversal. However, I don't seem to have enough data when looking at the 30*40 node to determine if it needs parens before going on to the 20+.. A very similar case would be (20+30)*40, which gets parsed correctly with 20+30 closer to the root, so once again, when visiting the 20+30 node I need to add parens before going on to *40.
This has to be a solved problem, but I never went to compiler school, so I know just enough about ASTs to be dangerous. What "a ha" am I missing?
// rip-common.dart:
import 'package:petitparser/petitparser.dart';
// import 'package:petitparser/debug.dart';
class Node {
int precedence;
List<dynamic> args;
Node([this.precedence = 0, this.args = const []]) {
// nodeList.add(this);
}
#override
String toString() => 'Node($precedence $args)';
String visit([int fromPrecedence = -1]) {
print('=== visiting $this ===');
var buf = StringBuffer();
var parens = (precedence > 0) &&
(fromPrecedence > 0) &&
(precedence < fromPrecedence);
print('<$fromPrecedence $precedence $parens>');
// for debugging:
var curlyOpen = '';
var curlyClose = '';
buf.write(parens ? '(' : curlyOpen);
for (var arg in args) {
if (arg is Node) {
buf.write(arg.visit(precedence));
} else if (arg is String) {
buf.write(arg);
} else {
print('not Node or String: $arg');
buf.write('$arg');
}
}
buf.write(parens ? ')' : curlyClose);
print('$buf for buf');
return '$buf';
}
}
class RIPParser {
Parser _make_parser() {
final builder = ExpressionBuilder();
var number = char('-').optional() &
digit().plus() &
(char('.') & digit().plus()).optional();
// precedence 5
builder.group()
..primitive(number.flatten().map((a) => Node(0, [a])))
..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a]));
// negation is a prefix operator
// precedence 4
builder.group()..prefix(char('-').trim(), (op, a) => Node(4, [op, a]));
// power is right-associative
// precedence 3
builder.group()..right(char('^').trim(), (a, op, b) => Node(3, [a, op, b]));
// multiplication and addition are left-associative
// precedence 2
builder.group()
..left(char('*').trim(), (a, op, b) => Node(2, [a, op, b]))
..left(char('/').trim(), (a, op, b) => Node(2, [a, op, b]));
// precedence 1
builder.group()
..left(char('+').trim(), (a, op, b) => Node(1, [a, op, b]))
..left(char('-').trim(), (a, op, b) => Node(1, [a, op, b]));
final parser = builder.build().end();
return parser;
}
Result _result(String input) {
var parser = _make_parser(); // eventually cache
var result = parser.parse(input);
return result;
}
String parse(String input) {
var result = _result(input);
if (result.isFailure) {
return result.message;
} else {
print('result.value = ${result.value}');
return '$result';
}
}
String visit(String input) {
var result = _result(input);
var top_node = result.value; // result.isFailure ...
return top_node.visit();
}
}
// rip_cmd_example.dart
import 'dart:io';
import 'package:rip_common/rip_common.dart';
void main() {
print('start');
String input;
while (true) {
input = stdin.readLineSync();
if (input.isEmpty) {
break;
}
print(RIPParser().parse(input));
print(RIPParser().visit(input));
}
;
print('done');
}
As you've observed, the ExpressionBuilder already assembles the tree in the right precedence order based on the operator groups you've specified.
This also happens for the wrapping parens node created here: ..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a])). If I test for this node, I get back the input string for your example expressions: var parens = precedence == 0 && args.length == 1 && args[0] is Node;.
Unless I am missing something, there should be no reason for you to track the precedence manually. I would also recommend that you create different node classes for the different operators: ValueNode, ParensNode, NegNode, PowNode, MulNode, ... A bit verbose, but much easier to understand what is going on, if each of them can just visit (print, evaluate, optimize, ...) itself.
I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}
// I know this might look ugly that I use a global variable ret to return parsed subtrees
// but please bear with it, I got used to this for various performance/usability reasons
var ret, tokens
function get_precedence(op) {
// this is an essential part, cannot parse an expression without the precedence checker
if (op == '*' || op == '/' || op == '%') return 14
if (op == '+' || op == '-') return 13
if (op == '<=' || op == '>=' || op == '<' || op == '>') return 11
if (op == '==' || op == '!=') return 10
if (op == '^') return 8
if (op == '&&') return 6
if (op == '||') return 5
return 0
}
function parse_primary(pos) {
// in the real language primary is almost everything that can be on the sides of +
// but here we only handle numbers detected with the JavaScript 'typeof' keyword
if (typeof tokens[pos] == 'number') {
ret = {
type: 'number',
value: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_operator(pos) {
// let's just reuse the function we already wrote insted of creating another huge 'if'
if (get_precedence(tokens[pos]) != 0) {
ret = {
type: 'operator',
operator: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_expr(pos) {
var stack = [], code = [], n, op, next, precedence
pos = parse_primary(pos)
if (pos == undefined) {
// error, an expression can only start with a primary
return undefined
}
stack.push(ret)
while (true) {
n = pos
pos = parse_operator(pos)
if (pos == undefined) break
op = ret
pos = parse_primary(pos)
if (pos == undefined) break
next = ret
precedence = get_precedence(op.operator)
while (stack.length > 0 && get_precedence(stack[stack.length - 1].operator) >= precedence) {
code.push(stack.pop())
}
stack.push(op)
code.push(next)
}
while(stack.length > 0) {
code.push(stack.pop())
}
if (code.length == 1) ret = code[0]
else ret = {
type: 'expr',
stack: code,
}
return n
}
function main() {
tokens = [1, '+', 2, '*', 3]
var pos = parse_expr(0)
if (pos) {
console.log('parsed expression AST')
console.log(ret)
}
else {
console.log('unable to parse anything')
}
}
main()
Here is your bare-bones implementation of shunting yard expression parsing. This is written in JavaScript. This is as minimalistic and simple as you can get. Tokenizing is left off for brevity, you give the parse the array of tokens (you call them lexemes).
The actual Shunting Yard is the parse_expr function. This is the "classic" implementation that uses the stack, this is my preference, some people prefer functional recursion.
Functions that parse various syntax elements are usually called "parselets". here we have three of them, one for expression, others are for primary and operator. If a parselet detects the corresponding syntax construction at the position pos it will return the next position right after the construct, and the construct itself in AST form is returned via the global variable ret. If the parselet does not find what it expects it returns undefined.
It is now trivially simple to add support for parens grouping (, just extend parse_primary with if (parse_group())... else if (parse_number())... etc. In the meantime your parse_primary will grow real big supporting various things, prefix operators, function calls, etc.
I would like to know how I can handle multiple optionals without concrete pattern matching for each possible permutation.
Below is a simplified example of the problem I am facing:
lexical Int = [0-9]+;
syntax Bool = "True" | "False";
syntax Period = "Day" | "Month" | "Quarter" | "Year";
layout Standard = [\ \t\n\f\r]*;
syntax Optionals = Int? i Bool? b Period? p;
str printOptionals(Optionals opt){
str res = "";
if(!isEmpty("<opt.i>")) { // opt has i is always true (same for opt.i?)
res += printInt(opt.i);
}
if(!isEmpty("<opt.b>")){
res += printBool(opt.b);
}
if(!isEmpty("<opt.p>")) {
res += printPeriod(opt.period);
}
return res;
}
str printInt(Int i) = "<i>";
str printBool(Bool b) = "<b>";
str printPeriod(Period p) = "<p>";
However this gives the error message:
The called signature: printInt(opt(lex("Int"))), does not match the declared signature: str printInt(sort("Int"));
How do I get rid of the opt part when I know it is there?
I'm not sure how ideal this is, but you could do this for now:
if (/Int i := opt.i) {
res += printInt(i);
}
This will extract the Int from within opt.i if it is there, but the match will fail if Int was not provided as one of the options.
The current master on github has the following feature to deal with optionals: they can be iterated over.
For example:
if (Int i <- opt.i) {
res += printInt(i);
}
The <- will produce false immediately if the optional value is absent, and otherwise loop once through and bind the value which is present to the pattern.
An untyped solution is to project out the element from the parse tree:
rascal>opt.i.args[0];
Tree: `1`
Tree: appl(prod(lex("Int"),[iter(\char-class([range(48,57)]))],{}),[appl(regular(iter(\char-class([range(48,57)]))),[char(49)])[#loc=|file://-|(0,1,<1,0>,<1,1>)]])[#loc=|file://-|(0,1,<1,0>,<1,1>)]
However, then to transfer this back to an Int you'd have to pattern match, like so:
rascal>if (Int i := opt.i.args[0]) { printInt(i); }
str: "1"
One could write a generic cast function to help out here:
rascal>&T cast(type[&T] t, value v) { if (&T a := v) return a; throw "cast exception"; }
ok
rascal>printInt(cast(#Int, opt.i.args[0]))
str: "1"
Still, I believe Rascal is missing a feature here. Something like this would be a good feature request:
rascal>Int j = opt.i.value;
rascal>opt.i has value
bool: true
I have two longish blocks of code that are identical except in various comparative statements > is switched with <, >= with <= etc. I wanted to put these in a function and use one operator or another depending on a function input.
I am coding in MQL5 but this is very similar to C++ so hopefully methods that work in this will also be useable in my case.
You can create a comparator function for each comparison you need, and then pass the right function as an argument to the longish code blocks (wrapped in a suitably defined function)
As an example, consider the following hypothetical case where a function (myFunc) receives 2 integers(a and b)
and processes them. The processing steps are similar except for the type of comparison done on the arguments. We get around the problem by providing myFunc with the right tool for comparison.
#include <iostream>
using namespace std;
bool comp1(int a, int b) {
return a > b;
}
bool comp2(int a, int b) {
return a < b;
}
void myFunc(int a, int b, bool (*myComp)(int, int)) {
bool res = myComp(a, b);
cout << "value : " << res << endl;
}
int main()
{
myFunc(1, 2, comp1); //use >
myFunc(1, 2, comp2); //use <
return 0;
}
Clearly, comp1 and comp2 are the 2 different comparators. We pass one of them to myFunc depending on the requirements (< or >).
The best thing is that your comparisons can now be as complex as you want, and myFunc is oblivious to the complexities.
Coding in MQL4 you haven't pointers to function / templates. MQL5 has templates but formal parameter types are only built-in or basic user-defined types.
You could try something like:
enum COMPARATOR
{
C_EQUAL = 0,
C_LESS = 1,
C_GREATER = -1
C_AT_MOST = 2,
C_AT_LEAST = -2,
};
bool cmp(int a, int b, COMPARATOR c)
{
switch (c)
{
case C_LESS: return a < b;
case C_AT_MOST: return a <= b;
case C_EQUAL: return a == b;
case C_AT_LEAST: return a >= b;
case C_GREATER: return a > b;
}
Alert("INTERNAL ERROR: UNKNOWN COMPARISON");
return false;
}
void a_function(COMPARATOR c)
{
if (cmp(MathRand(), 13, c))
Print("BOOM");
// *** If you need the "opposite" of c *** you can write:
if (cmp(Time[0], Time[1], COMPARATOR(-c))
Alert("DONE");
}
It isn't elegant but it's effective.
Pass in a "comparator" as a function or functor, in this case I'm using the std::less and std::greater functors defined in the functional header, there are functors defined for more or less all the operators.
#include <iostream>
#include <functional>
template<typename Comparator>
void do_something(Comparator comp)
{
int a = 1;
int b = 2;
if (comp(a, b)) {
std::cout << "expression was true" << std::endl;
} else {
std::cout << "expression was not true" << std::endl;
}
}
int main(int argc, char* argv[])
{
do_something(std::greater<int>());
do_something(std::less<int>());
}
Output:
expression was not true
expression was true
In short, I need to be able to traverse Z3_ast tree and access the data associated with its nodes. Cannot seem to find any documentation/examples on how to do that. Any pointers would be helpful.
At length, I need to parse smt2lib type formulae into Z3, make some variable to constant substitutions and then reproduce the formula in a data structure which is compatible with another unrelated SMT sovler (mistral to be specific, I don't think details about mistral are important to this question but funnily enough it does not have a command line interface where I can feed it text formulae. It just has a C API). I have figured that to generate the formula in mistral's format, I would need to traverse the Z3_ast tree and reconstruct the formula in the desired format. I cannot seem to find any documentation/examples that demonstrate how to do this. Any pointers would be helpful.
Consider using the C++ auxiliary classes defined at z3++.h. The Z3 distribution also includes an example using these classes. Here is a small code fragment that traverses a Z3 expression.
If your formulas do not contain quantifiers, then you don't even need to handle the is_quantifier() and is_var() branches.
void visit(expr const & e) {
if (e.is_app()) {
unsigned num = e.num_args();
for (unsigned i = 0; i < num; i++) {
visit(e.arg(i));
}
// do something
// Example: print the visited expression
func_decl f = e.decl();
std::cout << "application of " << f.name() << ": " << e << "\n";
}
else if (e.is_quantifier()) {
visit(e.body());
// do something
}
else {
assert(e.is_var());
// do something
}
}
void tst_visit() {
std::cout << "visit example\n";
context c;
expr x = c.int_const("x");
expr y = c.int_const("y");
expr z = c.int_const("z");
expr f = x*x - y*y >= 0;
visit(f);
}