What is the difference between stack<int> and stack<string> ? could someone please tell the difference - stack

/*
Evaluation Of postfix Expression in C++
Input Postfix expression must be in a desired format.
Operands must be integers and there should be space in between two operands.
Only '+' , '-' , '*' and '/' operators are expected.
*/
#include<iostream>
#include<stack>
#include<string>
using namespace std;
// Function to evaluate Postfix expression and return output
int EvaluatePostfix(string expression);
// Function to perform an operation and return output.
int PerformOperation(char operation, int operand1, int operand2);
// Function to verify whether a character is operator symbol or not.
bool IsOperator(char C);
// Function to verify whether a character is numeric digit.
bool IsNumericDigit(char C);
int main()
{
string expression;
cout<<"Enter Postfix Expression \n";
getline(cin,expression);
int result = EvaluatePostfix(expression);
cout<<"Output = "<<result<<"\n";
}
// Function to evaluate Postfix expression and return output
int EvaluatePostfix(string expression)
{
// Declaring a Stack from Standard template library in C++. whats the difference bet stack<int> and stack<string>
stack<int> S;
for(int i = 0;i< expression.length();i++) {
// Scanning each character from left.
// If character is a delimiter, move on.
if(expression[i] == ' ' || expression[i] == ',') continue;
// If character is operator, pop two elements from stack, perform operation and push the result back.
else if(IsOperator(expression[i])) {
// Pop two operands.
int operand2 = S.top(); S.pop();
int operand1 = S.top(); S.pop();
// Perform operation
int result = PerformOperation(expression[i], operand1, operand2);
//Push back result of operation on stack.
S.push(result);
}
else if(IsNumericDigit(expression[i])){
// Extract the numeric operand from the string
// Keep incrementing i as long as you are getting a numeric digit.
int operand = 0;
while(i<expression.length() && IsNumericDigit(expression[i])) {
// For a number with more than one digits, as we are scanning from left to right.
// Everytime , we get a digit towards right, we can multiply current total in operand by 10
// and add the new digit.
operand = (operand*10) + (expression[i] - '0');
i++;
}
// Finally, you will come out of while loop with i set to a non-numeric character or end of string
// decrement i because it will be incremented in increment section of loop once again.
// We do not want to skip the non-numeric character by incrementing i twice.
i--;
// Push operand on stack.
S.push(operand);
}
}
// If expression is in correct format, Stack will finally have one element. This will be the output.
return S.top();
}
// Function to verify whether a character is numeric digit.
bool IsNumericDigit(char C)
{
if(C >= '0' && C <= '9') return true;
return false;
}
// Function to verify whether a character is operator symbol or not.
bool IsOperator(char C)
{
if(C == '+' || C == '-' || C == '*' || C == '/')
return true;
return false;
}
// Function to perform an operation and return output.
int PerformOperation(char operation, int operand1, int operand2)
{
if(operation == '+') return operand1 +operand2;
else if(operation == '-') return operand1 - operand2;
else if(operation == '*') return operand1 * operand2;
else if(operation == '/') return operand1 / operand2;
else cout<<"Unexpected Error \n";
return -1;
}

Related

Need Lex regular expression to match string upto newline

I want to parse strings of the type :
a=some value
b=some other value
There are no blanks around '=' and values extend up to newline. There may be leading spaces.
My lex specification (relevant part) is:
%%
a= { printf("Found attr %s\n", yytext); return aATTR; }
^[ \r\t]+ { printf("Found space at the start %s\n", yytext); }
([^a-z]=).*$ { printf("Found value %s\n", yytext); }
\n { return NEWLINE; }
%%
I tried .*$ [^\n]* and a few other regular expressions but to no avail.
This looks pretty simple. Any suggestions? I am also aware that lex returns the longest match so that complicates it further. I get the whole line matched for some regular expressions I tried.
You probably want to incorporate separate start states. These permit you to encode simple contexts. The simple example below captures your id, operator and value on each call to yylex().
%{
char id;
char op;
char *value;
%}
%x VAL OP
%%
<INITIAL>[a-z]+ {
id = yytext[0];
yyleng = 0;
BEGIN OP;
}
<INITIAL,OP>[ \t]*
<OP>=[ \t]* {
op = yytext[0];
yyleng = 0;
BEGIN VAL;
}
<VAL>.*\n {
value = yytext;
BEGIN INITIAL;
return 1;
}
%%

Abstract Syntax Tree for Source Code including Expressions

I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}
// I know this might look ugly that I use a global variable ret to return parsed subtrees
// but please bear with it, I got used to this for various performance/usability reasons
var ret, tokens
function get_precedence(op) {
// this is an essential part, cannot parse an expression without the precedence checker
if (op == '*' || op == '/' || op == '%') return 14
if (op == '+' || op == '-') return 13
if (op == '<=' || op == '>=' || op == '<' || op == '>') return 11
if (op == '==' || op == '!=') return 10
if (op == '^') return 8
if (op == '&&') return 6
if (op == '||') return 5
return 0
}
function parse_primary(pos) {
// in the real language primary is almost everything that can be on the sides of +
// but here we only handle numbers detected with the JavaScript 'typeof' keyword
if (typeof tokens[pos] == 'number') {
ret = {
type: 'number',
value: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_operator(pos) {
// let's just reuse the function we already wrote insted of creating another huge 'if'
if (get_precedence(tokens[pos]) != 0) {
ret = {
type: 'operator',
operator: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_expr(pos) {
var stack = [], code = [], n, op, next, precedence
pos = parse_primary(pos)
if (pos == undefined) {
// error, an expression can only start with a primary
return undefined
}
stack.push(ret)
while (true) {
n = pos
pos = parse_operator(pos)
if (pos == undefined) break
op = ret
pos = parse_primary(pos)
if (pos == undefined) break
next = ret
precedence = get_precedence(op.operator)
while (stack.length > 0 && get_precedence(stack[stack.length - 1].operator) >= precedence) {
code.push(stack.pop())
}
stack.push(op)
code.push(next)
}
while(stack.length > 0) {
code.push(stack.pop())
}
if (code.length == 1) ret = code[0]
else ret = {
type: 'expr',
stack: code,
}
return n
}
function main() {
tokens = [1, '+', 2, '*', 3]
var pos = parse_expr(0)
if (pos) {
console.log('parsed expression AST')
console.log(ret)
}
else {
console.log('unable to parse anything')
}
}
main()
Here is your bare-bones implementation of shunting yard expression parsing. This is written in JavaScript. This is as minimalistic and simple as you can get. Tokenizing is left off for brevity, you give the parse the array of tokens (you call them lexemes).
The actual Shunting Yard is the parse_expr function. This is the "classic" implementation that uses the stack, this is my preference, some people prefer functional recursion.
Functions that parse various syntax elements are usually called "parselets". here we have three of them, one for expression, others are for primary and operator. If a parselet detects the corresponding syntax construction at the position pos it will return the next position right after the construct, and the construct itself in AST form is returned via the global variable ret. If the parselet does not find what it expects it returns undefined.
It is now trivially simple to add support for parens grouping (, just extend parse_primary with if (parse_group())... else if (parse_number())... etc. In the meantime your parse_primary will grow real big supporting various things, prefix operators, function calls, etc.

How do you print the conditional statement of an IfStmt in Clang?

I'm developing a plugin for the clang compiler, and would like the conditional expressions of if statements in string form. That is, given:
if (a + b + c > 10)
return;
and a reference to the IfStmt node that represents it, I would like to obtain the string "a + b + c > 10".
I suspect that isn't possible, but if anybody has any insight, it would be greatly appreciated.
Extract the condition part of the IfStmt, take its start and end location and use this to query the lexer for the underlying source code.
using namespace clang;
class IfStmtVisitor
: public RecursiveASTVisitor<IfStmtVisitor> {
SourceManager &sm; // Initialize me!
CompilerInstance &ci; // Initialize me!
bool VisitIfStmt(IfStmt *stmt) {
Expr *expr = stmt->getCond();
bool invalid;
CharSourceRange conditionRange =
CharSourceRange::getTokenRange(expr->getLocStart(), expr->getLocEnd());
StringRef str =
Lexer::getSourceText(conditionRange, sm, ci.getLangOpts(), &invalid);
if (invalid) {
return false;
}
llvm::outs() << "Condition: " << str << "\n";
return true;
}
};
Input source:
bool f(int a, int b, int c)
{
if (a + b + c > 10)
return true;
return false;
}
Output:
Condition string: a + b + c > 10
I believe you may want to try looking at the printPretty function defined in Stmt which IfStmt inherits from. That should hopefully get you close to what you want.

Parse stacked comparison expression into logical Conjunction Tree with antlr3

I have run into a problem, when i tried to parse a stacked arithmetic comparison expression:
"1<2<3<4<5"
into a logical Tree of Conjunctions:
CONJUNCTION(COMPARISON(1,2,<) COMPARISON(2,3,<) COMPARISON(3,4,<) COMPARISON(4,5,<))
Is there a way in Antlr3 Tree Rewrite rules to iterate through matched tokens and create the result Tree from them in the target language (I'm using java)? So i could make COMPARISON nodes from element x, x-1 of matched 'addition' tokens. I know i can reference the last result of a rule but that way i'd only get nested COMPARISON rules, that's not what i wish for.
/This is how i approached the problem, sadly it doesn't do what i would like to do yet of course.
fragment COMPARISON:;
operator
:
('<'|'>'|'<='|'>='|'=='|'!=')
;
comparison
#init{boolean secondpart = false;}
:
e=addition (operator {secondpart=true;} k=addition)*
-> {secondpart}? ^(COMPARISON ^(VALUES addition*) ^(OPERATORS operator*))
-> $e
;
//Right now what this does is:
tree=(COMPARISON (VALUES (INTEGERVALUE (VALUE 1)) (INTEGERVALUE (VALUE 2)) (INTEGERVALUE (VALUE 3)) (INTEGERVALUE (VALUE 4)) (INTEGERVALUE (VALUE 5))) (OPERATORS < < < <))
//The label for the CONJUNCTION TreeNode that i would like to use:
fragment CONJUNCTION:;
I came up with a nasty solution to this problem by writing actual tree building java code:
grammar testgrammarforcomparison;
options {
language = Java;
output = AST;
}
tokens
{
CONJUNCTION;
COMPARISON;
OPERATOR;
ADDITION;
}
WS
:
('\t' | '\f' | ' ' | '\r' | '\n' )+
{$channel = HIDDEN;}
;
comparison
#init
{
List<Object> additions = new ArrayList<Object>();
List<Object> operators = new ArrayList<Object>();
boolean secondpart = false;
}
:
(( e=addition {additions.add(e.getTree());} ) ( op=operator k=addition {additions.add(k.getTree()); operators.add(op.getTree()); secondpart = true;} )*)
{
if(secondpart)
{
root_0 = (Object)adaptor.nil();
Object root_1 = (Object)adaptor.nil();
root_1 = (Object)adaptor.becomeRoot(
(Object)adaptor.create(CONJUNCTION, "CONJUNCTION")
, root_1);
Object lastaddition = additions.get(0);
for(int i=1;i<additions.size();i++)
{
Object root_2 = (Object)adaptor.nil();
root_2 = (Object)adaptor.becomeRoot(
(Object)adaptor.create(COMPARISON, "COMPARISON")
, root_2);
adaptor.addChild(root_2, additions.get(i-1));
adaptor.addChild(root_2, operators.get(i-1));
adaptor.addChild(root_2, additions.get(i));
adaptor.addChild(root_1, root_2);
}
adaptor.addChild(root_0, root_1);
}
else
{
root_0 = (Object)adaptor.nil();
adaptor.addChild(root_0, e.getTree());
}
}
;
/** lowercase letters */
fragment LOWCHAR
: 'a'..'z';
/** uppercase letters */
fragment HIGHCHAR
: 'A'..'Z';
/** numbers */
fragment DIGIT
: '0'..'9';
fragment LETTER
: LOWCHAR
| HIGHCHAR
;
IDENTIFIER
:
LETTER (LETTER|DIGIT)*
;
addition
:
IDENTIFIER ->^(ADDITION IDENTIFIER)
;
operator
:
('<'|'>') ->^(OPERATOR '<'* '>'*)
;
parse
:
comparison EOF
;
For input
"DATA1 < DATA2 > DATA3"
This outputs tree such as:
If you guys know any better solutions, please tell me about them

How to print parser tree in Yacc (BISON)

I made a parser for the C- language using BISON and FlEX. It works and prints "syntax error" in terminal if given c- input code is syntactically wrong, otherwise print nothing.
But i want to print the parser tree relevant to given c- input code as the output of my parser. How do i do that? Is there function in BISON which can be used to print the parser tree?
The TXR language (http://www.nongnu.org/txr) uses Flex and Yacc for parsing its input. You can see the parse tree if you give it the -v option.
E.g.:
$ ./txr -v -c "#/[a-z]*|foo/"
spec:
(((text (#<sys:regex: 9d99268> or (0+ (set (#\a . #\z))) (compound #\f #\o #\o)))))
You construct the tree in the parser actions and print it yourself with a tree-printing routine. I used a Lisp-like object representation to make life easier.
Writing this out is handled by a recursive printing function which recognizes all the possible object types and renders them into notation. For instance above you see objects of character type printed with a hash-backslash notation, and the unprintable, opaque, compiled regex is printed using the notation #< ... >.
Here is a part of the grammar:
regexpr : regbranch { $$ = if3(cdr($1),
cons(compound_s, $1),
car($1)); }
| regexpr '|' regexpr { $$ = list(or_s, $1, $3, nao); }
| regexpr '&' regexpr { $$ = list(and_s, $1, $3, nao); }
| '~' regexpr { $$ = list(compl_s, $2, nao); }
| /* empty */ %prec LOW { $$ = nil; }
;
As you can see, constructing the AST is largely just simple construction of nested lists.
This form is very convenient to compile. The top-level function of the NFA-based regex compiler is very readable:
/*
* Input is the items from a regex form,
* not including the regex symbol.
* I.e. (rest '(regex ...)) not '(regex ...).
*/
static nfa_t nfa_compile_regex(val exp)
{
if (nullp(exp)) {
nfa_state_t *acc = nfa_state_accept();
nfa_state_t *s = nfa_state_empty(acc, 0);
return nfa_make(s, acc);
} else if (typeof(exp) == chr_s) {
nfa_state_t *acc = nfa_state_accept();
nfa_state_t *s = nfa_state_single(acc, c_chr(exp));
return nfa_make(s, acc);
} else if (exp == wild_s) {
nfa_state_t *acc = nfa_state_accept();
nfa_state_t *s = nfa_state_wild(acc);
return nfa_make(s, acc);
} else {
val sym = first(exp), args = rest(exp);
if (sym == set_s) {
return nfa_compile_set(args, nil);
} else if (sym == cset_s) {
return nfa_compile_set(args, t);
} else if (sym == compound_s) {
return nfa_compile_list(args);
} else if (sym == zeroplus_s) {
nfa_t nfa_arg = nfa_compile_regex(first(args));
nfa_state_t *acc = nfa_state_accept();
/* New start state has empty transitions going through
the inner NFA, or skipping it right to the new acceptance state. */
nfa_state_t *s = nfa_state_empty(nfa_arg.start, acc);
/* Convert acceptance state of inner NFA to one which has
an empty transition back to the start state, and
an empty transition to the new acceptance state. */
nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
return nfa_make(s, acc);
} else if (sym == oneplus_s) {
/* One-plus case differs from zero-plus in that the new start state
does not have an empty transition to the acceptance state.
So the inner NFA must be traversed once. */
nfa_t nfa_arg = nfa_compile_regex(first(args));
nfa_state_t *acc = nfa_state_accept();
nfa_state_t *s = nfa_state_empty(nfa_arg.start, 0); /* <-- diff */
nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
return nfa_make(s, acc);
} else if (sym == optional_s) {
/* In this case, we can keep the acceptance state of the inner
NFA as the acceptance state of the new NFA. We simply add
a new start state which can short-circuit to it via an empty
transition. */
nfa_t nfa_arg = nfa_compile_regex(first(args));
nfa_state_t *s = nfa_state_empty(nfa_arg.start, nfa_arg.accept);
return nfa_make(s, nfa_arg.accept);
} else if (sym == or_s) {
/* Simple: make a new start and acceptance state, which form
the ends of a spindle that goes through two branches. */
nfa_t nfa_first = nfa_compile_regex(first(args));
nfa_t nfa_second = nfa_compile_regex(second(args));
nfa_state_t *acc = nfa_state_accept();
/* New state s has empty transitions into each inner NFA. */
nfa_state_t *s = nfa_state_empty(nfa_first.start, nfa_second.start);
/* Acceptance state of each inner NFA converted to empty
transition to new combined acceptance state. */
nfa_state_empty_convert(nfa_first.accept, acc, 0);
nfa_state_empty_convert(nfa_second.accept, acc, 0);
return nfa_make(s, acc);
} else {
internal_error("bad operator in regex");
}
}
}

Resources