I've been inspecting the type of expressions in the Clang AST and have found some oddities that I'm interested if someone can explain. Here is a piece of code:
struct C {
int x{0};
};
void test(const int n) {
C c;
C a[3];
C b[n];
C x = (5, c);
int foo[5][3] = {};
auto p = new C[5];
auto pp = new C[n];
auto ppp = new C*[n]{};
}
Clang produces the following AST:
`-CompoundStmt 0x562aa86c4750 <col:24, line:14:1>
|-DeclStmt 0x562aa86943d8 <line:6:3, col:6>
| `-VarDecl 0x562aa8693d68 <col:3, col:5> col:5 used c 'C' callinit
| `-CXXConstructExpr 0x562aa86943b0 <col:5> 'C' 'void () noexcept'
|-DeclStmt 0x562aa8694620 <line:7:3, col:9>
| `-VarDecl 0x562aa8694488 <col:3, col:8> col:5 a 'C [3]' callinit
| `-CXXConstructExpr 0x562aa8694518 <col:5> 'C [3]' 'void () noexcept'
|-DeclStmt 0x562aa86c3340 <line:8:3, col:9>
| `-VarDecl 0x562aa86c3280 <col:3, col:8> col:5 b 'C [n]' callinit
| `-CXXConstructExpr 0x562aa86c3318 <col:5> 'C [n]' 'void () noexcept' <<< 1
|-DeclStmt 0x562aa86c3570 <line:9:3, col:15>
| `-VarDecl 0x562aa86c3368 <col:3, col:14> col:5 x 'C' cinit
| `-CXXConstructExpr 0x562aa86c3540 <col:9, col:14> 'C' 'void (const C &) noexcept'
| `-ImplicitCastExpr 0x562aa86c3450 <col:9, col:14> 'const C' lvalue <NoOp>
| `-ParenExpr 0x562aa86c3430 <col:9, col:14> 'C' lvalue
| `-BinaryOperator 0x562aa86c3410 <col:10, col:13> 'C' lvalue ','
| |-IntegerLiteral 0x562aa86c33d0 <col:10> 'int' 5
| `-DeclRefExpr 0x562aa86c33f0 <col:13> 'C' lvalue Var 0x562aa8693d68 'c' 'C'
|-DeclStmt 0x562aa86c37b8 <line:10:3, col:21>
| `-VarDecl 0x562aa86c36c0 <col:3, col:20> col:7 foo 'int [5][3]' cinit
| `-InitListExpr 0x562aa86c3768 <col:19, col:20> 'int [5][3]'
| `-array_filler: ImplicitValueInitExpr 0x562aa86c37a8 <<invalid sloc>> 'int [3]'
|-DeclStmt 0x562aa86c4178 <line:11:3, col:20>
| `-VarDecl 0x562aa86c3830 <col:3, col:19> col:8 p 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4050 <col:12, col:19> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c38c8 <col:18> 'unsigned long' <IntegralCast>
| | `-IntegerLiteral 0x562aa86c3898 <col:18> 'int' 5
| `-CXXConstructExpr 0x562aa86c4028 <col:16> 'C [5]' 'void () noexcept'
|-DeclStmt 0x562aa86c43f8 <line:12:3, col:21>
| `-VarDecl 0x562aa86c41c8 <col:3, col:20> col:8 pp 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4330 <col:13, col:20> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c4290 <col:19> 'unsigned long' <IntegralCast>
| | `-ImplicitCastExpr 0x562aa86c4260 <col:19> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x562aa86c4230 <col:19> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
| `-CXXConstructExpr 0x562aa86c4308 <col:17> 'C []' 'void () noexcept' <<< 2
`-DeclStmt 0x562aa86c4738 <line:13:3, col:25>
`-VarDecl 0x562aa86c4448 <col:3, col:24> col:8 ppp 'C **':'C **' cinit
`-CXXNewExpr 0x562aa86c4638 <col:14, col:24> 'C **' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
|-ImplicitCastExpr 0x562aa86c4560 <col:21> 'unsigned long' <IntegralCast>
| `-ImplicitCastExpr 0x562aa86c4548 <col:21> 'int' <LValueToRValue>
| `-DeclRefExpr 0x562aa86c44b0 <col:21> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
`-InitListExpr 0x562aa86c45a8 <col:23, col:24> 'C *[0]' <<< 3
`-array_filler: ImplicitValueInitExpr 0x562aa86c4628 <<invalid sloc>> 'C *'
The oddities / inconsistencies seem to occur with the types of the initializing expressions for new (the last two lines).
In C b[n] (1 above), the type of the CXXConstructorExpr is C [n], but in the CXXNewExpr corresponding to new C[n] (2 above), the (effectively same) CXXConstructorExpr has type C [].
In the last line the InitListExpr (3 above) has type C *[0], but according to C++, this is not a valid type (arrays can not have length 0). As above, I would expect C *[n] (though C *[] could be reasonable as well).
Related
What does the field "0x671xxxx" in the AST structure mean?What APIs can I get to this field?(The example program is as follows. Thanks:)
t12.b = 3;
|-BinaryOperator 0x6712768 <line:22:2, col:10> 'int' '='
| |-MemberExpr 0x6712718 <col:2, col:6> 'int' lvalue .b 0x6711e00
| | `-DeclRefExpr 0x67126f8 <col:2> 'struct test1':'struct test1' lvalue Var 0x6712260 't12' 'struct test1':'struct test1'
| `-IntegerLiteral 0x6712748 <col:10> 'int' 3
I have the following c code test1.c:
double multiply(double x, double y) {
return x * y * y;
}
int main(int argc, char const *argv[])
{
double a;
int b;
float d;
double x = 3.0;
double y = 5.0;
double z = multiply(x,y);
return 0;
}
I'm trying to get the RecursiveASTVisitor to visit variable declarations (clang::VarDecl) by implementing bool VistVarDecl(clang::VarDecl *vardecl). However, the clang::VarDecl nodes are never visited even though I've managed to visit all the other nodes in the AST though. Moreover, using clang-query on test1.c I can match varDecl.
My RecursiveASTVistor is as follows:
struct MyASTVisitor : public clang::RecursiveASTVistor<MyASTVisitor> {
bool VistVarDecl(clang::VarDecl *vardecl) {
llvm::outs() << "Found a VarDecl";
};
bool VisitFunctionDecl(clang::FunctionDecl *decl) {
llvm::outs() << "Found a FunctionDecl";
};
// other functions implemented similarly just to see if it visits properly
bool VisitParmVarmDecl(clang::ParmVarDecl *paramvardecl);
bool VisitCallExpr(clang::CallExpr *callexpr);
bool VisitImplicitCastExpr(clang::ImplicitCastExpr *castexpr);
bool VisitBinaryOperator(clang::BinaryOperator *bo);
bool VisitDeclStmt(clang::DeclStmt *declstmt);
bool VisitDeclRefExpr(clang::DeclRefExpr *declrefexpr);
bool VisitFloatingLiteral(clang::FloatingLiteral *floatliteral);
};
struct MyASTConsumer : public clang::ASTConsumer {
bool HandleTopLevelDecl(clang::DeclGroupRef DR) override {
for (clang::DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) {
Visitor.TraverseDecl(*b);
}
return true;
}
private:
MyASTVisitor Visitor;
};
Does anyone know the clang::VarDecl nodes are the only nodes that aren't visited by the RecursiveASTVisitor but are matched by clang-query?
The AST dump from clang is given below:
TranslationUnitDecl 0x7fba2300ce08 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7fba2300d6a0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7fba2300d3a0 '__int128'
|-TypedefDecl 0x7fba2300d710 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7fba2300d3c0 'unsigned __int128'
|-TypedefDecl 0x7fba2300da18 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7fba2300d7f0 'struct __NSConstantString_tag'
| `-Record 0x7fba2300d768 '__NSConstantString_tag'
|-TypedefDecl 0x7fba2300dab0 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7fba2300da70 'char *'
| `-BuiltinType 0x7fba2300cea0 'char'
|-TypedefDecl 0x7fba2300dda8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x7fba2300dd50 'struct __va_list_tag [1]' 1
| `-RecordType 0x7fba2300db90 'struct __va_list_tag'
| `-Record 0x7fba2300db08 '__va_list_tag'
|-FunctionDecl 0x7fba22829370 <test4.c:2:1, line:4:1> line:2:8 used multiply 'double (double, double)'
| |-ParmVarDecl 0x7fba22829218 <col:17, col:24> col:24 used x 'double'
| |-ParmVarDecl 0x7fba22829298 <col:27, col:34> col:34 used y 'double'
| `-CompoundStmt 0x7fba22829560 <col:37, line:4:1>
| `-ReturnStmt 0x7fba22829550 <line:3:2, col:17>
| `-BinaryOperator 0x7fba22829530 <col:9, col:17> 'double' '*'
| |-BinaryOperator 0x7fba228294d8 <col:9, col:13> 'double' '*'
| | |-ImplicitCastExpr 0x7fba228294a8 <col:9> 'double' <LValueToRValue>
| | | `-DeclRefExpr 0x7fba22829468 <col:9> 'double' lvalue ParmVar 0x7fba22829218 'x' 'double'
| | `-ImplicitCastExpr 0x7fba228294c0 <col:13> 'double' <LValueToRValue>
| | `-DeclRefExpr 0x7fba22829488 <col:13> 'double' lvalue ParmVar 0x7fba22829298 'y' 'double'
| `-ImplicitCastExpr 0x7fba22829518 <col:17> 'double' <LValueToRValue>
| `-DeclRefExpr 0x7fba228294f8 <col:17> 'double' lvalue ParmVar 0x7fba22829298 'y' 'double'
`-FunctionDecl 0x7fba228297d0 <line:6:1, line:22:1> line:6:5 main 'int (int, const char **)'
|-ParmVarDecl 0x7fba22829590 <col:10, col:14> col:14 argc 'int'
|-ParmVarDecl 0x7fba228296b0 <col:20, col:37> col:32 argv 'const char **':'const char **'
`-CompoundStmt 0x7fba22829da8 <line:7:1, line:22:1>
|-DeclStmt 0x7fba22829928 <line:10:2, col:10>
| `-VarDecl 0x7fba228298c0 <col:2, col:9> col:9 a 'double'
|-DeclStmt 0x7fba228299c0 <line:12:2, col:7>
| `-VarDecl 0x7fba22829958 <col:2, col:6> col:6 b 'int'
|-DeclStmt 0x7fba22829a58 <line:14:2, col:9>
| `-VarDecl 0x7fba228299f0 <col:2, col:8> col:8 d 'float'
|-DeclStmt 0x7fba22829b10 <line:16:2, col:16>
| `-VarDecl 0x7fba22829a88 <col:2, col:13> col:9 used x 'double' cinit
| `-FloatingLiteral 0x7fba22829af0 <col:13> 'double' 3.000000e+00
|-DeclStmt 0x7fba22829bc8 <line:17:2, col:16>
| `-VarDecl 0x7fba22829b40 <col:2, col:13> col:9 used y 'double' cinit
| `-FloatingLiteral 0x7fba22829ba8 <col:13> 'double' 5.000000e+00
|-DeclStmt 0x7fba22829d60 <line:19:2, col:26>
| `-VarDecl 0x7fba22829bf8 <col:2, col:25> col:9 z 'double' cinit
| `-CallExpr 0x7fba22829d00 <col:13, col:25> 'double'
| |-ImplicitCastExpr 0x7fba22829ce8 <col:13> 'double (*)(double, double)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x7fba22829c60 <col:13> 'double (double, double)' Function 0x7fba22829370 'multiply' 'double (double, double)'
| |-ImplicitCastExpr 0x7fba22829d30 <col:22> 'double' <LValueToRValue>
| | `-DeclRefExpr 0x7fba22829c80 <col:22> 'double' lvalue Var 0x7fba22829a88 'x' 'double'
| `-ImplicitCastExpr 0x7fba22829d48 <col:24> 'double' <LValueToRValue>
| `-DeclRefExpr 0x7fba22829ca0 <col:24> 'double' lvalue Var 0x7fba22829b40 'y' 'double'
`-ReturnStmt 0x7fba22829d98 <line:21:2, col:9>
`-IntegerLiteral 0x7fba22829d78 <col:9> 'int' 0
I'd like to write an analyzer that counts virtual function calls by looking at the C++ AST (output of -ast-dump), but I'm having difficulty determining which function calls are virtual and which are not. Here's an example piece of code:
struct A {
A() {}
virtual int foo() { return 0; }
};
struct B : public A {
B() {}
//virtual int foo() { return 3; }
};
struct C : public B {
C() {}
virtual int foo() { return 1; }
};
int test(C* c) {
return c->foo() + c->B::foo() + c->A::foo();
}
In my initial implementation, I simply check whether or not the function being called is virtual using expr->getMethodDecl()->isVirtual() but (from what I understand) the calls in c->B::foo() and c->A::foo() are not actually virtual so it isn't sufficient to ask if the function being called is virtual.
When I dump the AST from this piece of code, I get the following tree for test:
`-FunctionDecl 0x1d33620 <line:19:1, line:21:1> line:19:5 test 'int (C *)'
|-ParmVarDecl 0x1d33558 <col:10, col:13> col:13 used c 'C *'
`-CompoundStmt 0x1d339f0 <col:16, line:21:1>
`-ReturnStmt 0x1d339e0 <line:20:5, col:47>
`-BinaryOperator 0x1d339c0 <col:12, col:47> 'int' '+'
|-BinaryOperator 0x1d338a8 <col:12, col:33> 'int' '+'
| |-CXXMemberCallExpr 0x1d33778 <col:12, col:19> 'int'
| | `-MemberExpr 0x1d33748 <col:12, col:15> '<bound member function type>' ->foo 0x1d31c80
| | `-ImplicitCastExpr 0x1d33730 <col:12> 'C *' <LValueToRValue>
| | `-DeclRefExpr 0x1d33710 <col:12> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
| `-CXXMemberCallExpr 0x1d33848 <col:23, col:33> 'int'
| `-MemberExpr 0x1d33800 <col:23, col:29> '<bound member function type>' ->foo 0x1d02400
| `-ImplicitCastExpr 0x1d33888 <col:23> 'A *' <UncheckedDerivedToBase (A)>
| `-ImplicitCastExpr 0x1d33868 <col:23> 'B *' <UncheckedDerivedToBase (B)>
| `-ImplicitCastExpr 0x1d337d8 <col:23> 'C *' <LValueToRValue>
| `-DeclRefExpr 0x1d33798 <col:23> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
`-CXXMemberCallExpr 0x1d33978 <col:37, col:47> 'int'
`-MemberExpr 0x1d33930 <col:37, col:43> '<bound member function type>' ->foo 0x1d02400
`-ImplicitCastExpr 0x1d33998 <col:37> 'A *' <UncheckedDerivedToBase (B -> A)>
`-ImplicitCastExpr 0x1d33908 <col:37> 'C *' <LValueToRValue>
`-DeclRefExpr 0x1d338c8 <col:37> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
From looking this, it seems that the UncheckedDerivedToBase cast is marking places where the function call is non-virtual. Is this always the case? Should I always consider a call of the form CXXMemberCallExpr (MemberExpr (ImplicitCastExpr<UncheckedDerivedToBase> e))) to be a non-virtual call? Are there other patterns that would indicate non-virtual function calls? Is there any more robust way to determine this fact?
EDIT: Some more investigation suggests that the above hypothesis that UncheckedDerivedToBase is not sufficient. This code:
struct A {
virtual int foo() { return 100; }
};
struct B : public A {
virtual int foo() { return 10; }
};
int test(B* b) {
return b->foo() + b->B::foo();
}
seem to produce exactly the same AST node (at least indistinguishable on the console) for both calls, but the semantics should be different according to the standard if b is actually a derived class, e.g. C above.
The distinguishing factor is hasQualifier on the MemberExpr of the callee object. If hasQualifier is true, then the function call is non-virtual.
I was trying to write a simple clang-tidy checker that will check for constructor that is calling fopen() more than once. My indention is to find potential memory leak in case any exception happens in the second fopen() call.
class Dummy_file
{
FILE *f1_;
FILE *f2_;
public:
Dummy_file(const char* f1_name, const char* f2_name, const char * mode){
f1_ = fopen(f1_name, mode);
f2_ = fopen(f2_name, mode);
}
~Dummy_file(){
fclose(f1_);
fclose(f2_);
}
};
Using this
callExpr(callee(functionDecl(hasName("fopen")))).bind("fopencalls")
was able to find all the fopen() calls.
But I could not find cxxConstructorDeclusing this.
cxxConstructorDecl(has(callExpr(callee(functionDecl(hasName("fopen")))))).bind("ctr")
I am doubting since I am using cxxConstructorDecl my filter is not applied to the constructor body.
So how to find function body from a function declaration?
Short explanation
You should use hasDescendant matcher instead of has matcher. While has checks only immediate children of the tested node for the match, hasDescendant matches any descendant.
Here you can see that for your example:
|-CXXConstructorDecl <line:8:3, line:11:3> line:8:3 Dummy_file 'void (const char *, const char *, const char *)'
| |-ParmVarDecl <col:14, col:26> col:26 used f1_name 'const char *'
| |-ParmVarDecl <col:35, col:47> col:47 used f2_name 'const char *'
| |-ParmVarDecl <col:56, col:68> col:68 used mode 'const char *'
| `-CompoundStmt <col:74, line:11:3>
| |-BinaryOperator <line:9:5, col:30> 'FILE *' lvalue '='
| | |-MemberExpr <col:5> 'FILE *' lvalue ->f1_ 0x55d36491a230
| | | `-CXXThisExpr <col:5> 'Dummy_file *' this
| | `-CallExpr <col:11, col:30> 'FILE *'
| | |-ImplicitCastExpr <col:11> 'FILE *(*)(const char *__restrict, const char *__restrict)' <FunctionToPointerDecay>
| | | `-DeclRefExpr <col:11> 'FILE *(const char *__restrict, const char *__restrict)' lvalue Function 0x55d3648fa220 'fopen' 'FILE *(const char *__restrict, const char *__restrict)'
| | |-ImplicitCastExpr <col:17> 'const char *' <LValueToRValue>
| | | `-DeclRefExpr <col:17> 'const char *' lvalue ParmVar 0x55d36491a310 'f1_name' 'const char *'
| | `-ImplicitCastExpr <col:26> 'const char *' <LValueToRValue>
| | `-DeclRefExpr <col:26> 'const char *' lvalue ParmVar 0x55d36491a400 'mode' 'const char *'
CallExpr is a not a child of CXXConstructorDecl, but of BinaryOperator.
Solution
Below I finalized your matcher and checked it in clang-query.
clang-query> match cxxConstructorDecl(hasDescendant(callExpr(callee(functionDecl(hasName("fopen")))).bind("fopencall"))).bind("ctr")
Match #1:
$TEST_DIR/test.cpp:8:3: note: "ctr" binds here
Dummy_file(const char *f1_name, const char *f2_name, const char *mode) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$TEST_DIR/test.cpp:9:11: note: "fopencall" binds here
f1_ = fopen(f1_name, mode);
^~~~~~~~~~~~~~~~~~~~
$TEST_DIR/test.cpp:8:3: note: "root" binds here
Dummy_file(const char *f1_name, const char *f2_name, const char *mode) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 match.
I hope this answers your question!
In an LLVM-9.x plugin, I'd like to access the initializer (= 42) associated with variable Y in the template specialization T<42> foo:
template<int X>
class T {
//Clang produces different ASTs on this code in C++14 and C++17
//(in 17, there's no initializer for Y (= 42) in the specialized
//class foo below).
static constexpr int Y = X;
};
T<42> foo;
With clang++ -std=c++14 I get an AST (excerpted) that looks like:
> clang++ -std=c++14 -Xclang -ast-dump -fsyntax-only file.cpp
|-ClassTemplateDecl 0x563e38c10980 <test_constexpr.cpp:1:1, line:7:1> line:2:7 T
...
| | `-VarDecl 0x563e38c10c98 <line:6:3, col:28> col:24 Y 'const int' static constexpr cinit
| | `-DeclRefExpr 0x563e38c10d00 <col:28> 'int' NonTypeTemplateParm 0x563e38c10878 'X' 'int'
| `-ClassTemplateSpecializationDecl 0x563e38c10d78 <line:1:1, line:7:1> line:2:7 class T
...
| |-VarDecl 0x563e38c110c8 <line:6:3, col:28> col:24 Y 'const int' static constexpr cinit
| | `-SubstNonTypeTemplateParmExpr 0x563e38c11160 <col:28> 'int' <== MISSING in C++17 BELOW
| | `-IntegerLiteral 0x563e38c11140 <col:28> 'int' 42 <== MISSING in C++17 BELOW
and things work fine (assuming decl is the second VarDecl for Y, I'm able to access 42 by decl->getInit()).
With clang++ -std=c++17, however, I get the new AST:
> clang++ -std=c++17 -Xclang -ast-dump -fsyntax-only file.cpp
|-ClassTemplateDecl 0x56114838ff40 <test_constexpr.cpp:1:1, line:7:1> line:2:7 T
...
| | `-VarDecl 0x561148390258 <line:6:3, col:28> col:24 Y 'const int' static inline constexpr cinit
| | `-DeclRefExpr 0x5611483902c0 <col:28> 'int' NonTypeTemplateParm 0x56114838fe38 'X' 'int'
| `-ClassTemplateSpecializationDecl 0x561148390338 <line:1:1, line:7:1> line:2:7 class T
...
| |-VarDecl 0x561148390688 <line:6:3, col:24> col:24 Y 'const int' static constexpr
in which the initializing expression for Y (-IntegerLiteral 0x563e38c11140 <col:28> 'int' 42) no longer appears. Including a use of Y like int z = Y in the class definition causes the initializer to reappear.
Questions
Should I expect not to see the initializer expression in the AST above in clang++ -std=c++17?
If so, what's the best way to access the initializer for Y in T<42> foo from within an LLVM-9.x plugin?
Thanks! And please let me know if I've left out relevant information -- I'll be happy to update the question to provide it.