I'd like to write an analyzer that counts virtual function calls by looking at the C++ AST (output of -ast-dump), but I'm having difficulty determining which function calls are virtual and which are not. Here's an example piece of code:
struct A {
A() {}
virtual int foo() { return 0; }
};
struct B : public A {
B() {}
//virtual int foo() { return 3; }
};
struct C : public B {
C() {}
virtual int foo() { return 1; }
};
int test(C* c) {
return c->foo() + c->B::foo() + c->A::foo();
}
In my initial implementation, I simply check whether or not the function being called is virtual using expr->getMethodDecl()->isVirtual() but (from what I understand) the calls in c->B::foo() and c->A::foo() are not actually virtual so it isn't sufficient to ask if the function being called is virtual.
When I dump the AST from this piece of code, I get the following tree for test:
`-FunctionDecl 0x1d33620 <line:19:1, line:21:1> line:19:5 test 'int (C *)'
|-ParmVarDecl 0x1d33558 <col:10, col:13> col:13 used c 'C *'
`-CompoundStmt 0x1d339f0 <col:16, line:21:1>
`-ReturnStmt 0x1d339e0 <line:20:5, col:47>
`-BinaryOperator 0x1d339c0 <col:12, col:47> 'int' '+'
|-BinaryOperator 0x1d338a8 <col:12, col:33> 'int' '+'
| |-CXXMemberCallExpr 0x1d33778 <col:12, col:19> 'int'
| | `-MemberExpr 0x1d33748 <col:12, col:15> '<bound member function type>' ->foo 0x1d31c80
| | `-ImplicitCastExpr 0x1d33730 <col:12> 'C *' <LValueToRValue>
| | `-DeclRefExpr 0x1d33710 <col:12> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
| `-CXXMemberCallExpr 0x1d33848 <col:23, col:33> 'int'
| `-MemberExpr 0x1d33800 <col:23, col:29> '<bound member function type>' ->foo 0x1d02400
| `-ImplicitCastExpr 0x1d33888 <col:23> 'A *' <UncheckedDerivedToBase (A)>
| `-ImplicitCastExpr 0x1d33868 <col:23> 'B *' <UncheckedDerivedToBase (B)>
| `-ImplicitCastExpr 0x1d337d8 <col:23> 'C *' <LValueToRValue>
| `-DeclRefExpr 0x1d33798 <col:23> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
`-CXXMemberCallExpr 0x1d33978 <col:37, col:47> 'int'
`-MemberExpr 0x1d33930 <col:37, col:43> '<bound member function type>' ->foo 0x1d02400
`-ImplicitCastExpr 0x1d33998 <col:37> 'A *' <UncheckedDerivedToBase (B -> A)>
`-ImplicitCastExpr 0x1d33908 <col:37> 'C *' <LValueToRValue>
`-DeclRefExpr 0x1d338c8 <col:37> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
From looking this, it seems that the UncheckedDerivedToBase cast is marking places where the function call is non-virtual. Is this always the case? Should I always consider a call of the form CXXMemberCallExpr (MemberExpr (ImplicitCastExpr<UncheckedDerivedToBase> e))) to be a non-virtual call? Are there other patterns that would indicate non-virtual function calls? Is there any more robust way to determine this fact?
EDIT: Some more investigation suggests that the above hypothesis that UncheckedDerivedToBase is not sufficient. This code:
struct A {
virtual int foo() { return 100; }
};
struct B : public A {
virtual int foo() { return 10; }
};
int test(B* b) {
return b->foo() + b->B::foo();
}
seem to produce exactly the same AST node (at least indistinguishable on the console) for both calls, but the semantics should be different according to the standard if b is actually a derived class, e.g. C above.
The distinguishing factor is hasQualifier on the MemberExpr of the callee object. If hasQualifier is true, then the function call is non-virtual.
Related
I've been inspecting the type of expressions in the Clang AST and have found some oddities that I'm interested if someone can explain. Here is a piece of code:
struct C {
int x{0};
};
void test(const int n) {
C c;
C a[3];
C b[n];
C x = (5, c);
int foo[5][3] = {};
auto p = new C[5];
auto pp = new C[n];
auto ppp = new C*[n]{};
}
Clang produces the following AST:
`-CompoundStmt 0x562aa86c4750 <col:24, line:14:1>
|-DeclStmt 0x562aa86943d8 <line:6:3, col:6>
| `-VarDecl 0x562aa8693d68 <col:3, col:5> col:5 used c 'C' callinit
| `-CXXConstructExpr 0x562aa86943b0 <col:5> 'C' 'void () noexcept'
|-DeclStmt 0x562aa8694620 <line:7:3, col:9>
| `-VarDecl 0x562aa8694488 <col:3, col:8> col:5 a 'C [3]' callinit
| `-CXXConstructExpr 0x562aa8694518 <col:5> 'C [3]' 'void () noexcept'
|-DeclStmt 0x562aa86c3340 <line:8:3, col:9>
| `-VarDecl 0x562aa86c3280 <col:3, col:8> col:5 b 'C [n]' callinit
| `-CXXConstructExpr 0x562aa86c3318 <col:5> 'C [n]' 'void () noexcept' <<< 1
|-DeclStmt 0x562aa86c3570 <line:9:3, col:15>
| `-VarDecl 0x562aa86c3368 <col:3, col:14> col:5 x 'C' cinit
| `-CXXConstructExpr 0x562aa86c3540 <col:9, col:14> 'C' 'void (const C &) noexcept'
| `-ImplicitCastExpr 0x562aa86c3450 <col:9, col:14> 'const C' lvalue <NoOp>
| `-ParenExpr 0x562aa86c3430 <col:9, col:14> 'C' lvalue
| `-BinaryOperator 0x562aa86c3410 <col:10, col:13> 'C' lvalue ','
| |-IntegerLiteral 0x562aa86c33d0 <col:10> 'int' 5
| `-DeclRefExpr 0x562aa86c33f0 <col:13> 'C' lvalue Var 0x562aa8693d68 'c' 'C'
|-DeclStmt 0x562aa86c37b8 <line:10:3, col:21>
| `-VarDecl 0x562aa86c36c0 <col:3, col:20> col:7 foo 'int [5][3]' cinit
| `-InitListExpr 0x562aa86c3768 <col:19, col:20> 'int [5][3]'
| `-array_filler: ImplicitValueInitExpr 0x562aa86c37a8 <<invalid sloc>> 'int [3]'
|-DeclStmt 0x562aa86c4178 <line:11:3, col:20>
| `-VarDecl 0x562aa86c3830 <col:3, col:19> col:8 p 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4050 <col:12, col:19> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c38c8 <col:18> 'unsigned long' <IntegralCast>
| | `-IntegerLiteral 0x562aa86c3898 <col:18> 'int' 5
| `-CXXConstructExpr 0x562aa86c4028 <col:16> 'C [5]' 'void () noexcept'
|-DeclStmt 0x562aa86c43f8 <line:12:3, col:21>
| `-VarDecl 0x562aa86c41c8 <col:3, col:20> col:8 pp 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4330 <col:13, col:20> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c4290 <col:19> 'unsigned long' <IntegralCast>
| | `-ImplicitCastExpr 0x562aa86c4260 <col:19> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x562aa86c4230 <col:19> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
| `-CXXConstructExpr 0x562aa86c4308 <col:17> 'C []' 'void () noexcept' <<< 2
`-DeclStmt 0x562aa86c4738 <line:13:3, col:25>
`-VarDecl 0x562aa86c4448 <col:3, col:24> col:8 ppp 'C **':'C **' cinit
`-CXXNewExpr 0x562aa86c4638 <col:14, col:24> 'C **' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
|-ImplicitCastExpr 0x562aa86c4560 <col:21> 'unsigned long' <IntegralCast>
| `-ImplicitCastExpr 0x562aa86c4548 <col:21> 'int' <LValueToRValue>
| `-DeclRefExpr 0x562aa86c44b0 <col:21> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
`-InitListExpr 0x562aa86c45a8 <col:23, col:24> 'C *[0]' <<< 3
`-array_filler: ImplicitValueInitExpr 0x562aa86c4628 <<invalid sloc>> 'C *'
The oddities / inconsistencies seem to occur with the types of the initializing expressions for new (the last two lines).
In C b[n] (1 above), the type of the CXXConstructorExpr is C [n], but in the CXXNewExpr corresponding to new C[n] (2 above), the (effectively same) CXXConstructorExpr has type C [].
In the last line the InitListExpr (3 above) has type C *[0], but according to C++, this is not a valid type (arrays can not have length 0). As above, I would expect C *[n] (though C *[] could be reasonable as well).
What does the field "0x671xxxx" in the AST structure mean?What APIs can I get to this field?(The example program is as follows. Thanks:)
t12.b = 3;
|-BinaryOperator 0x6712768 <line:22:2, col:10> 'int' '='
| |-MemberExpr 0x6712718 <col:2, col:6> 'int' lvalue .b 0x6711e00
| | `-DeclRefExpr 0x67126f8 <col:2> 'struct test1':'struct test1' lvalue Var 0x6712260 't12' 'struct test1':'struct test1'
| `-IntegerLiteral 0x6712748 <col:10> 'int' 3
My code that I tried is below:
if(const ArraySubscriptExpr *array = Result.Nodes.getNodeAs<ArraySubscriptExpr>("array"))
{
llvm::outs() << array->getBase() <<'\n';
}
getBase() should print the array identifier, but it is printing the address, e.g. 0x559f7da7e838. How can I print the array name/identifier?
For example, in the case of arr[i] = 40;
I want to print arr
getBase returns a pointer to the base expression, so that is why the address is being printed. The AST for arr[i] is:
| |-ArraySubscriptExpr 0xc04c608 <col:3, col:8> 'double' lvalue
| | |-ImplicitCastExpr 0xc04c5d8 <col:3> 'double *' <LValueToRValue>
| | | `-DeclRefExpr 0xc04c598 <col:3> 'double *' lvalue Var 0xc04c480 'arr' 'double *'
| | `-ImplicitCastExpr 0xc04c5f0 <col:7> 'int' <LValueToRValue>
| | `-DeclRefExpr 0xc04c5b8 <col:7> 'int' lvalue Var 0xc04c518 'i' 'int'
As can be seen, the name of the array appears in the children of the ImplicitCastExpr node which is children of ArraySubscriptExpr. This worked for me:
if (auto *array = dyn_cast<ArraySubscriptExpr>(st)) {
if (auto *cast = dyn_cast<ImplicitCastExpr>(array->getBase())) {
if (auto *decl = dyn_cast<DeclRefExpr>(cast->getSubExpr())) {
cout << decl->getNameInfo().getAsString() << endl;
}
}
}
I'm writing a tool that traverses a clang AST and prints it out in a particular format, but I don't want to print function templates, just their (full) specializations. I'm wondering how to determine if a const CXXMethodDecl* represents a fully instantiated method (either it is not templated or it is a full instantiation of a templated method) or a not completely instantiated method. Here is an example of what I am seeing with my tool:
template <typename Z>
class P
{
int lookup ();
};
template <typename Z>
int P<Z>::lookup ()
{
return Z::f();
}
struct X {
static int f();
};
template class P<X>;
constructs the following AST. I don't want to see the first CXXRecordDecl because that is not fully instantiated, but I do want to see the second one (which is fully instantiated).
|-ClassTemplateDecl 0x3eff5b8 <minimal.cpp:1:1, line:5:1> line:2:7 Pte
| |-TemplateTypeParmDecl 0x3eff480 <line:1:11, col:20> col:20 typename depth 0 index 0 PTT
| |-CXXRecordDecl 0x3eff530 <line:2:1, line:5:1> line:2:7 class Pte definition
| | |-DefinitionData empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| | | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial needs_implicit
| | |-CXXRecordDecl 0x3eff800 <col:1, col:7> col:7 implicit class Pte
| | `-CXXMethodDecl 0x3eff910 <line:4:5, col:17> col:9 lookup 'int ()'
| `-ClassTemplateSpecialization 0x3f00010 'Pte'
|-CXXMethodDecl 0x3effbb0 parent 0x3eff530 prev 0x3eff910 <line:7:1, line:11:1> line:8:15 lookup 'int ()'
^^^^^^^^^^^^^^^^^^^^^^^^^^ I DO NOT WANT TO SEE THIS
| `-CompoundStmt 0x3effd50 <line:9:1, line:11:1>
| `-ReturnStmt 0x3effd40 <line:10:5, col:19>
| `-CallExpr 0x3effd20 <col:12, col:19> '<dependent type>'
| `-CXXDependentScopeMemberExpr 0x3effcd8 <col:12, col:17> '<dependent type>' lvalue ->f
|-CXXRecordDecl 0x3effd68 <line:13:1, line:15:1> line:13:8 referenced struct X definition
| |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveConstructor exists simple trivial needs_implicit
| | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveAssignment exists simple trivial needs_implicit
| | `-Destructor simple irrelevant trivial needs_implicit
| |-CXXRecordDecl 0x3effe78 <col:1, col:8> col:8 implicit struct X
| `-CXXMethodDecl 0x3efff50 <line:14:5, col:18> col:16 used f 'int ()' static
`-ClassTemplateSpecializationDecl 0x3f00010 <line:17:1, col:21> col:16 class Pte definition
|-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
|-TemplateArgument type 'X'
|-CXXRecordDecl 0x3f001f8 prev 0x3f00010 <line:2:1, col:7> col:7 implicit class Pte
`-CXXMethodDecl 0x3f00280 <line:8:1, line:11:1> line:4:9 lookup 'int ()'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I want to see this
`-CompoundStmt 0x3f2e1b0 <line:9:1, line:11:1>
`-ReturnStmt 0x3f2e1a0 <line:10:5, col:19>
`-CallExpr 0x3f2e180 <col:12, col:19> 'int'
`-ImplicitCastExpr 0x3f2e168 <col:12, col:17> 'int (*)()' <FunctionToPointerDecay>
`-DeclRefExpr 0x3f2e110 <col:12, col:17> 'int ()' lvalue CXXMethod 0x3efff50 'f' 'int ()'
Whenever type/function/variable is declared in a template or are a template, clang calls it a dependent context. You probably have seen the word dependent all around AST nodes within templates.
I've modified your original test case to include a templated method in a non-template type:
template <typename Z>
class P
{
int lookup ();
};
template <typename Z>
int P<Z>::lookup ()
{
return Z::f();
}
struct X {
static int f() { return 42; }
};
struct Y {
template <typename Z>
int foo() { return Z::f(); }
};
template class P<X>;
Here is a very simple recursive AST visitor printing out only functions that you are interested in:
class NonDependentMethodVisitor
: public clang::ASTConsumer,
public clang::RecursiveASTVisitor<NonDependentMethodVisitor> {
public:
void HandleTranslationUnit(clang::ASTContext &Context) {
this->TraverseTranslationUnitDecl(Context.getTranslationUnitDecl());
}
bool shouldVisitTemplateInstantiations() const { return true; }
bool VisitCXXMethodDecl(clang::CXXMethodDecl *MD) {
if (!MD->isDependentContext()) {
MD->dump();
}
return true;
}
};
which produces the following output for the test snippet:
CXXMethodDecl 0x384eda0 <$TEST_DIR/test.cpp:14:5, col:33> col:16 used f 'int ()' static
`-CompoundStmt 0x384ee80 <col:20, col:33>
`-ReturnStmt 0x384ee70 <col:22, col:29>
`-IntegerLiteral 0x384ee50 <col:29> 'int' 42
CXXMethodDecl 0x387e170 <$TEST_DIR/test.cpp:8:1, line:11:1> line:4:9 lookup 'int ()'
`-CompoundStmt 0x387e450 <line:9:1, line:11:1>
`-ReturnStmt 0x387e440 <line:10:5, col:17>
`-CallExpr 0x387e420 <col:12, col:17> 'int'
`-ImplicitCastExpr 0x387e408 <col:12, col:15> 'int (*)()' <FunctionToPointerDecay>
`-DeclRefExpr 0x387e3b0 <col:12, col:15> 'int ()' lvalue CXXMethod 0x384eda0 'f' 'int ()'
I hope this answers your question!
I was trying to write a simple clang-tidy checker that will check for constructor that is calling fopen() more than once. My indention is to find potential memory leak in case any exception happens in the second fopen() call.
class Dummy_file
{
FILE *f1_;
FILE *f2_;
public:
Dummy_file(const char* f1_name, const char* f2_name, const char * mode){
f1_ = fopen(f1_name, mode);
f2_ = fopen(f2_name, mode);
}
~Dummy_file(){
fclose(f1_);
fclose(f2_);
}
};
Using this
callExpr(callee(functionDecl(hasName("fopen")))).bind("fopencalls")
was able to find all the fopen() calls.
But I could not find cxxConstructorDeclusing this.
cxxConstructorDecl(has(callExpr(callee(functionDecl(hasName("fopen")))))).bind("ctr")
I am doubting since I am using cxxConstructorDecl my filter is not applied to the constructor body.
So how to find function body from a function declaration?
Short explanation
You should use hasDescendant matcher instead of has matcher. While has checks only immediate children of the tested node for the match, hasDescendant matches any descendant.
Here you can see that for your example:
|-CXXConstructorDecl <line:8:3, line:11:3> line:8:3 Dummy_file 'void (const char *, const char *, const char *)'
| |-ParmVarDecl <col:14, col:26> col:26 used f1_name 'const char *'
| |-ParmVarDecl <col:35, col:47> col:47 used f2_name 'const char *'
| |-ParmVarDecl <col:56, col:68> col:68 used mode 'const char *'
| `-CompoundStmt <col:74, line:11:3>
| |-BinaryOperator <line:9:5, col:30> 'FILE *' lvalue '='
| | |-MemberExpr <col:5> 'FILE *' lvalue ->f1_ 0x55d36491a230
| | | `-CXXThisExpr <col:5> 'Dummy_file *' this
| | `-CallExpr <col:11, col:30> 'FILE *'
| | |-ImplicitCastExpr <col:11> 'FILE *(*)(const char *__restrict, const char *__restrict)' <FunctionToPointerDecay>
| | | `-DeclRefExpr <col:11> 'FILE *(const char *__restrict, const char *__restrict)' lvalue Function 0x55d3648fa220 'fopen' 'FILE *(const char *__restrict, const char *__restrict)'
| | |-ImplicitCastExpr <col:17> 'const char *' <LValueToRValue>
| | | `-DeclRefExpr <col:17> 'const char *' lvalue ParmVar 0x55d36491a310 'f1_name' 'const char *'
| | `-ImplicitCastExpr <col:26> 'const char *' <LValueToRValue>
| | `-DeclRefExpr <col:26> 'const char *' lvalue ParmVar 0x55d36491a400 'mode' 'const char *'
CallExpr is a not a child of CXXConstructorDecl, but of BinaryOperator.
Solution
Below I finalized your matcher and checked it in clang-query.
clang-query> match cxxConstructorDecl(hasDescendant(callExpr(callee(functionDecl(hasName("fopen")))).bind("fopencall"))).bind("ctr")
Match #1:
$TEST_DIR/test.cpp:8:3: note: "ctr" binds here
Dummy_file(const char *f1_name, const char *f2_name, const char *mode) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$TEST_DIR/test.cpp:9:11: note: "fopencall" binds here
f1_ = fopen(f1_name, mode);
^~~~~~~~~~~~~~~~~~~~
$TEST_DIR/test.cpp:8:3: note: "root" binds here
Dummy_file(const char *f1_name, const char *f2_name, const char *mode) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 match.
I hope this answers your question!