I am trying to build call graphs using Clang AST.
Is there a way to somehow link the parameters of a function to the arguments of an inner function call?
For example, given the following function:
void chainedIncrement(int *ptr) {
simplePointerIncr(ptr);
for (int i=0;i<3;i++) {
simplePointerIncr(ptr);
}
}
I looking for a way to be able to link ptr from chainedIncrement function to the argument of simplePointerIncr function. Doing this will allow building a call graph.
Maybe there is a way of getting the same id while calling getId() on parameters and arguments.
I tried to use the following AST matcher:
functionDecl(hasDescendant(callExpr(callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");)).bind("outerFunc")
It seems that arguments are of type Expr while function parameters are of type ParmVarDecl.
Assuming that the parameter is passed as-is, without modification to an inner function, is there a way to link them somehow?
Thanks
UPDATE: Added my solution
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");
Here is a simplified example:
int add2(int var) {
return var+2;
}
int caller(int var) {
add2(var);
for (int i=0; i<3; i++) {
add2(var);
}
return var;
}
int main(int argc, const char **argv) {
int ret = 0;
caller(ret);
return 0;
}
use Clang-query to show the matcher result:
clang-query> match callExpr(hasAnyArgument(hasAncestor(functionDecl(hasName("caller")))))
Match #1:
~/main.cpp:5:3: note: "root" binds here
add2(var);
^~~~~~~~~
Match #2:
~/main.cpp:7:5: note: "root" binds here
add2(var);
^~~~~~~~~
2 matches.
It matches the function calls that use the parameter of function caller
There is a matcher called forEachArgumentWithParam(). It allows to bind arguments to a callee function to its parameters.
Another matcher, equalsBoundNode() allows to bind the parameters of the outer function, to the arguments of the callee function.
auto calleeArgVarDecl = declRefExpr(to(varDecl().bind("callerArg")));
auto innerCallExpr = callExpr(
forEachArgumentWithParam(calleeArgVarDecl, parmVarDecl().bind("calleeParam")),
callee(functionDecl().bind("calleeFunc")),unless(isExpansionInSystemHeader())).bind("callExpr");
auto fullMatcher = functionDecl(forEachDescendant(innerCallExpr),forEachDescendant(parmVarDecl(equalsBoundNode("callerArg")).bind("outerFuncParam"))).bind("outerFunc");
Related
I can't understand how the closure works in Dart. Why does BMW stay? This explanation causes my neurons to overheat. A lexical closure is a functional object that has access to variables from its lexical domain. Even if it is used outside of its original scope.
`void main() {
var car = makeCar('BMW');
print(makeCar);
print(car);
print(makeCar('Tesla'));
print(car('Audi'));
print(car('Nissan'));
print(car('Toyota'));
}
String Function(String) makeCar(String make) {
var ingane = '4.4';
return (model) => '$model,$ingane,$make';
}`
Console
Closure 'makeCar'
Closure 'makeCar_closure'
Closure 'makeCar_closure'
Audi,4.4,BMW
Nissan,4.4,BMW
Toyota,4.4,BMW
Calling car('Audi') is equal to calling (makeCar('BMW'))('Audi');
A lexical closure is a functional object that has access to variables from its lexical domain. Even if it is used outside of its original scope.
in simple english:
String make will stay valid as long as the returned function is not out of scope because the returned function has reference to String make.
In essence, you "inject" information needed for the newly created function. Your car knows that make is "BMW"
I think I figured it out. Here is an example where I left comments. Maybe it will help someone.
void main() {
var pr = funkOut(10); // assign a reference to an object instance
// of the Function class to the pr variable. pr is a closure because
// it is assigned a reference to an instance that contains a lexical
// environment (int a) and an anonymous function from this environment.
// 10 transfer to a
print(pr(5)); // 5 transfer to b //15
print(pr(10)); // 10 transfer to b //20
pr = funkOut(20);// 20 transfer to a
print(pr(5)); // 5 transfer to b //25
print(pr); // Closure: (int) => int
}
Function funkOut(int a) {
return (int b) => a + b;
}
I have below test code:
typedef void (*funcPtrType)()
funcPtrType FPT;
void myFunc(){
}
int main(){
FPT = myFunc;
FPT();
return 0;
}
And following is the part of AST dump of this code:
My question is, from which API can I get the 'void (*)()' information from DeclRefExpr node?
Already tried dynamic casting this node to VarDecl but from it I could not reach the information I mentioned.
Thanks in advance.
If you have a DeclRefExpr, that is an expression that refers to a declared entity. Call the getDecl method to get the associated ValueDecl, which is the declaration itself. On that object, call getType to get the QualType, which is the type, possibly including cv-qualifiers.
For example:
DeclRefExpr const *dre = ...; // wherever you got it
ValueDecl const *decl = dre->getDecl();
QualType type = decl->getType();
In this case, the type is a typedef. To inspect the underlying type, call getTypePtr to get the unqualified type, then getUnqualifiedDesugaredType to skip typedefs:
clang::Type const *underType = type.getTypePtr()->getUnqualifiedDesugaredType();
You can then call, for example, underType->isPointerType() to find out if it is a pointer type, etc. See the documentation for clang::Type for other ways to query it.
If you want to get a string representation of underType, use the static QualType::print method, something like this:
LangOptions lo;
PrintingPolicy pp(lo);
std::string s;
llvm::raw_string_ostream rso(s);
QualType::print(underType, Qualifiers(), rso, lo, llvm::Twine());
errs() << "type as string: \"" << rso.str() << "\"\n";
For your example, this will print:
type as string: "void (*)()"
Functions in Dart are first-class objects, allowing you to pass them to other objects or functions.
void main() {
var shout = (msg) => ' ${msg.toUpperCase()} ';
print(shout("yo"));
}
This made me wonder if there was a way to modify a function a run time, just like an object, prior to passing it to something else. For example:
Function add(int input) {
return add + 2;
}
If I wanted to make the function a generic addition function, then I would do:
Function add(int input, int increment) {
return add + increment;
}
But then the problem would be that the object I am passing the function to would need to specify the increment. I would like to pass the add function to another object, with the increment specified at run time, and declared within the function body so that the increment cannot be changed by the recipient of the function object.
The answer seems to be to use a lexical closure.
From here: https://dart.dev/guides/language/language-tour#built-in-types
A closure is a function object that has access to variables in its
lexical scope, even when the function is used outside of its original
scope.
Functions can close over variables defined in surrounding scopes. In
the following example, makeAdder() captures the variable addBy.
Wherever the returned function goes, it remembers addBy.
/// Returns a function that adds [addBy] to the
/// function's argument.
Function makeAdder(int addBy) {
return (int i) => addBy + i;
}
void main() {
// Create a function that adds 2.
var add2 = makeAdder(2);
// Create a function that adds 4.
var add4 = makeAdder(4);
assert(add2(3) == 5);
assert(add4(3) == 7);
}
In the above cases, we pass 2 or 4 into the makeAdder function. The makeAdder function uses the parameter to create and return a function object that can be passed to other objects.
You most likely don't need to modify a closure, just the ability to create customized closures.
The latter is simple:
int Function(int) makeAdder(int increment) => (int value) => value + increment;
...
foo(makeAdder(1)); // Adds 1.
foo(makeAdder(4)); // Adds 2.
You can't change which variables a closure is referencing, but you can change their values ... if you an access the variable. For local variables, that's actually hard.
Mutating state which makes an existing closure change behavior can sometimes be appropriate, but those functions should be very precise about how they change and where they are being used. For a function like add which is used for its behavior, changing the behavior is rarely a good idea. It's better to replace the closure in the specific places that need to change behavior, and not risk changing the behavior in other places which happen to depend on the same closure. Otherwise it becomes very important to control where the closure actually flows.
If you still want to change the behavior of an existing global, you need to change a variable that it depends on.
Globals are easy:
int increment = 1;
int globalAdder(int value) => value + increment;
...
foo(globalAdd); // Adds 1.
increment = 2;
foo(globalAdd); // Adds 2.
I really can't recommend mutating global variables. It scales rather badly. You have no control over anything.
Another option is to use an instance variable to hold the modifiable value.
class MakeAdder {
int increment = 1;
int instanceAdd(int value) => value + increment;
}
...
var makeAdder = MakeAdder();
var adder = makeAdder.instanceAdd;
...
foo(adder); // Adds 1.
makeAdder.increment = 2;
foo(adder); // Adds 2.
That gives you much more control over who can access the increment variable. You can create multiple independent mutaable adders without them stepping on each other's toes.
To modify a local variable, you need someone to give you access to it, from inside the function where the variable is visible.
int Function(int) makeAdder(void Function(void Function(int)) setIncrementCallback) {
var increment = 1;
setIncrementCallback((v) {
increment = v;
});
return (value) => value + increment;
}
...
void Function(int) setIncrement;
int Function(int) localAdd = makeAdder((inc) { setIncrement = inc; });
...
foo(localAdd); // Adds 1.
setIncrement(2);
foo(localAdd); // Adds 2.
This is one way of passing back a way to modify the local increment variable.
It's almost always far too complicated an approach for what it gives you, I'd go with the instance variable instead.
Often, the instance variable will actually represent something in your model, some state which can meaningfully change, and then it becomes predictable and understandable when and how the state of the entire model changes, including the functions referring to that model.
Using partial function application
You can use a partial function application to bind arguments to functions.
If you have something like:
int add(int input, int increment) => input + increment;
and want to pass it to another function that expects to supply fewer arguments:
int foo(int Function(int input) applyIncrement) => applyIncrement(10);
then you could do:
foo((input) => add(input, 2); // `increment` is fixed to 2
foo((input) => add(input, 4); // `increment` is fixed to 4
Using callable objects
Another approach would be to make a callable object:
class Adder {
int increment = 0;
int call(int input) => input + increment;
}
which could be used with the same foo function above:
var adder = Adder()..increment = 2;
print(foo(adder)); // Prints: 12
adder.increment = 4;
print(foo(adder)); // Prints: 14
I've got a tokeniser and a parser. the parser has a special token type, KEYWORD, for keywords (there are ~50). In my parser I want to ensure that the tokens are what I'd expect, so I've got rules for each. Like so:
KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];
This works well enough, but it's not case insensitive (and the grammar I'm trying to handle is!). I'd like to use boost::iequals, but attempts to convert _1 to an std::string result in the following error:
error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')
How can I treat these keywords as strings and ensure they're the expected text irrespective of case?
A little learning went a long way. I added the following to my lexer:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef void type;
};
template <typename Value>
void operator()(Value const& val) const
{
// This modifies the original input string.
typedef boost::iterator_range<std::string::iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
std::for_each(ip.begin(), ip.end(),
[](char& in)
{
in = std::toupper(in);
});
}
};
boost::phoenix::function<normalise_keyword_impl> normalise_keyword;
// The rest...
};
And then used phoenix to bind the action to the keyword token in my constructor, like so:
this->self =
KEYWORD [normalise_keyword(_val)]
// The rest...
;
Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?
I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')
Very peculiar, it appears I have some more learning to do.
Final Solution
Okay, I ended up using this function:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef std::string type;
};
template <typename Value>
std::string operator()(Value const& val) const
{
// Copy the token and update the attribute value.
typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
auto result = std::string(ip.begin(), ip.end());
result = boost::to_upper_copy(result);
return result;
}
};
And this semantic action:
KEYWORD [_val = normalise_keyword(_val)]
With (and this sorted things out), a modified token_type:
typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;
// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);
// ...
The important addition being boost::mpl::vector<std::string> >. The result:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')
I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.
I've been trying to use libtooling to rename classes in source, and have hit a snag wrt function returns: there doesn't seem to be an API to get the source extent of just the return type.
I could hack it by assuming the return type is before the function id, but this doesn't handle trailing return types in C++11.
Does anyone have a better suggestion?
Thanks!
// simplified example replacing only value type returns
virtual void run(const ast_matchers::MatchFinder::MatchResult& Result) {
SourceManager& src = *result_.SourceManager;
const FunctionDecl* const function =
result_.Nodes.getDeclAs<FunctionDecl>("function");
CharSourceRange range = Lexer::makeFileCharRange(
CharSourceRange::getTokenRange(function->getLocStart(),
function->getLocation.getLocWithOffset(-1)),
src, LangOptions());
_replace->insert(Replacement(src, range, "newClass));
}