I'm writing a tool that traverses a clang AST and prints it out in a particular format, but I don't want to print function templates, just their (full) specializations. I'm wondering how to determine if a const CXXMethodDecl* represents a fully instantiated method (either it is not templated or it is a full instantiation of a templated method) or a not completely instantiated method. Here is an example of what I am seeing with my tool:
template <typename Z>
class P
{
int lookup ();
};
template <typename Z>
int P<Z>::lookup ()
{
return Z::f();
}
struct X {
static int f();
};
template class P<X>;
constructs the following AST. I don't want to see the first CXXRecordDecl because that is not fully instantiated, but I do want to see the second one (which is fully instantiated).
|-ClassTemplateDecl 0x3eff5b8 <minimal.cpp:1:1, line:5:1> line:2:7 Pte
| |-TemplateTypeParmDecl 0x3eff480 <line:1:11, col:20> col:20 typename depth 0 index 0 PTT
| |-CXXRecordDecl 0x3eff530 <line:2:1, line:5:1> line:2:7 class Pte definition
| | |-DefinitionData empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| | | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveConstructor exists simple trivial needs_implicit
| | | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| | | |-MoveAssignment exists simple trivial needs_implicit
| | | `-Destructor simple irrelevant trivial needs_implicit
| | |-CXXRecordDecl 0x3eff800 <col:1, col:7> col:7 implicit class Pte
| | `-CXXMethodDecl 0x3eff910 <line:4:5, col:17> col:9 lookup 'int ()'
| `-ClassTemplateSpecialization 0x3f00010 'Pte'
|-CXXMethodDecl 0x3effbb0 parent 0x3eff530 prev 0x3eff910 <line:7:1, line:11:1> line:8:15 lookup 'int ()'
^^^^^^^^^^^^^^^^^^^^^^^^^^ I DO NOT WANT TO SEE THIS
| `-CompoundStmt 0x3effd50 <line:9:1, line:11:1>
| `-ReturnStmt 0x3effd40 <line:10:5, col:19>
| `-CallExpr 0x3effd20 <col:12, col:19> '<dependent type>'
| `-CXXDependentScopeMemberExpr 0x3effcd8 <col:12, col:17> '<dependent type>' lvalue ->f
|-CXXRecordDecl 0x3effd68 <line:13:1, line:15:1> line:13:8 referenced struct X definition
| |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveConstructor exists simple trivial needs_implicit
| | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| | |-MoveAssignment exists simple trivial needs_implicit
| | `-Destructor simple irrelevant trivial needs_implicit
| |-CXXRecordDecl 0x3effe78 <col:1, col:8> col:8 implicit struct X
| `-CXXMethodDecl 0x3efff50 <line:14:5, col:18> col:16 used f 'int ()' static
`-ClassTemplateSpecializationDecl 0x3f00010 <line:17:1, col:21> col:16 class Pte definition
|-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
|-TemplateArgument type 'X'
|-CXXRecordDecl 0x3f001f8 prev 0x3f00010 <line:2:1, col:7> col:7 implicit class Pte
`-CXXMethodDecl 0x3f00280 <line:8:1, line:11:1> line:4:9 lookup 'int ()'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I want to see this
`-CompoundStmt 0x3f2e1b0 <line:9:1, line:11:1>
`-ReturnStmt 0x3f2e1a0 <line:10:5, col:19>
`-CallExpr 0x3f2e180 <col:12, col:19> 'int'
`-ImplicitCastExpr 0x3f2e168 <col:12, col:17> 'int (*)()' <FunctionToPointerDecay>
`-DeclRefExpr 0x3f2e110 <col:12, col:17> 'int ()' lvalue CXXMethod 0x3efff50 'f' 'int ()'
Whenever type/function/variable is declared in a template or are a template, clang calls it a dependent context. You probably have seen the word dependent all around AST nodes within templates.
I've modified your original test case to include a templated method in a non-template type:
template <typename Z>
class P
{
int lookup ();
};
template <typename Z>
int P<Z>::lookup ()
{
return Z::f();
}
struct X {
static int f() { return 42; }
};
struct Y {
template <typename Z>
int foo() { return Z::f(); }
};
template class P<X>;
Here is a very simple recursive AST visitor printing out only functions that you are interested in:
class NonDependentMethodVisitor
: public clang::ASTConsumer,
public clang::RecursiveASTVisitor<NonDependentMethodVisitor> {
public:
void HandleTranslationUnit(clang::ASTContext &Context) {
this->TraverseTranslationUnitDecl(Context.getTranslationUnitDecl());
}
bool shouldVisitTemplateInstantiations() const { return true; }
bool VisitCXXMethodDecl(clang::CXXMethodDecl *MD) {
if (!MD->isDependentContext()) {
MD->dump();
}
return true;
}
};
which produces the following output for the test snippet:
CXXMethodDecl 0x384eda0 <$TEST_DIR/test.cpp:14:5, col:33> col:16 used f 'int ()' static
`-CompoundStmt 0x384ee80 <col:20, col:33>
`-ReturnStmt 0x384ee70 <col:22, col:29>
`-IntegerLiteral 0x384ee50 <col:29> 'int' 42
CXXMethodDecl 0x387e170 <$TEST_DIR/test.cpp:8:1, line:11:1> line:4:9 lookup 'int ()'
`-CompoundStmt 0x387e450 <line:9:1, line:11:1>
`-ReturnStmt 0x387e440 <line:10:5, col:17>
`-CallExpr 0x387e420 <col:12, col:17> 'int'
`-ImplicitCastExpr 0x387e408 <col:12, col:15> 'int (*)()' <FunctionToPointerDecay>
`-DeclRefExpr 0x387e3b0 <col:12, col:15> 'int ()' lvalue CXXMethod 0x384eda0 'f' 'int ()'
I hope this answers your question!
Related
I've been inspecting the type of expressions in the Clang AST and have found some oddities that I'm interested if someone can explain. Here is a piece of code:
struct C {
int x{0};
};
void test(const int n) {
C c;
C a[3];
C b[n];
C x = (5, c);
int foo[5][3] = {};
auto p = new C[5];
auto pp = new C[n];
auto ppp = new C*[n]{};
}
Clang produces the following AST:
`-CompoundStmt 0x562aa86c4750 <col:24, line:14:1>
|-DeclStmt 0x562aa86943d8 <line:6:3, col:6>
| `-VarDecl 0x562aa8693d68 <col:3, col:5> col:5 used c 'C' callinit
| `-CXXConstructExpr 0x562aa86943b0 <col:5> 'C' 'void () noexcept'
|-DeclStmt 0x562aa8694620 <line:7:3, col:9>
| `-VarDecl 0x562aa8694488 <col:3, col:8> col:5 a 'C [3]' callinit
| `-CXXConstructExpr 0x562aa8694518 <col:5> 'C [3]' 'void () noexcept'
|-DeclStmt 0x562aa86c3340 <line:8:3, col:9>
| `-VarDecl 0x562aa86c3280 <col:3, col:8> col:5 b 'C [n]' callinit
| `-CXXConstructExpr 0x562aa86c3318 <col:5> 'C [n]' 'void () noexcept' <<< 1
|-DeclStmt 0x562aa86c3570 <line:9:3, col:15>
| `-VarDecl 0x562aa86c3368 <col:3, col:14> col:5 x 'C' cinit
| `-CXXConstructExpr 0x562aa86c3540 <col:9, col:14> 'C' 'void (const C &) noexcept'
| `-ImplicitCastExpr 0x562aa86c3450 <col:9, col:14> 'const C' lvalue <NoOp>
| `-ParenExpr 0x562aa86c3430 <col:9, col:14> 'C' lvalue
| `-BinaryOperator 0x562aa86c3410 <col:10, col:13> 'C' lvalue ','
| |-IntegerLiteral 0x562aa86c33d0 <col:10> 'int' 5
| `-DeclRefExpr 0x562aa86c33f0 <col:13> 'C' lvalue Var 0x562aa8693d68 'c' 'C'
|-DeclStmt 0x562aa86c37b8 <line:10:3, col:21>
| `-VarDecl 0x562aa86c36c0 <col:3, col:20> col:7 foo 'int [5][3]' cinit
| `-InitListExpr 0x562aa86c3768 <col:19, col:20> 'int [5][3]'
| `-array_filler: ImplicitValueInitExpr 0x562aa86c37a8 <<invalid sloc>> 'int [3]'
|-DeclStmt 0x562aa86c4178 <line:11:3, col:20>
| `-VarDecl 0x562aa86c3830 <col:3, col:19> col:8 p 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4050 <col:12, col:19> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c38c8 <col:18> 'unsigned long' <IntegralCast>
| | `-IntegerLiteral 0x562aa86c3898 <col:18> 'int' 5
| `-CXXConstructExpr 0x562aa86c4028 <col:16> 'C [5]' 'void () noexcept'
|-DeclStmt 0x562aa86c43f8 <line:12:3, col:21>
| `-VarDecl 0x562aa86c41c8 <col:3, col:20> col:8 pp 'C *':'C *' cinit
| `-CXXNewExpr 0x562aa86c4330 <col:13, col:20> 'C *' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
| |-ImplicitCastExpr 0x562aa86c4290 <col:19> 'unsigned long' <IntegralCast>
| | `-ImplicitCastExpr 0x562aa86c4260 <col:19> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x562aa86c4230 <col:19> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
| `-CXXConstructExpr 0x562aa86c4308 <col:17> 'C []' 'void () noexcept' <<< 2
`-DeclStmt 0x562aa86c4738 <line:13:3, col:25>
`-VarDecl 0x562aa86c4448 <col:3, col:24> col:8 ppp 'C **':'C **' cinit
`-CXXNewExpr 0x562aa86c4638 <col:14, col:24> 'C **' array Function 0x562aa86c3ad0 'operator new[]' 'void *(unsigned long)'
|-ImplicitCastExpr 0x562aa86c4560 <col:21> 'unsigned long' <IntegralCast>
| `-ImplicitCastExpr 0x562aa86c4548 <col:21> 'int' <LValueToRValue>
| `-DeclRefExpr 0x562aa86c44b0 <col:21> 'const int' lvalue ParmVar 0x562aa8693b70 'n' 'const int'
`-InitListExpr 0x562aa86c45a8 <col:23, col:24> 'C *[0]' <<< 3
`-array_filler: ImplicitValueInitExpr 0x562aa86c4628 <<invalid sloc>> 'C *'
The oddities / inconsistencies seem to occur with the types of the initializing expressions for new (the last two lines).
In C b[n] (1 above), the type of the CXXConstructorExpr is C [n], but in the CXXNewExpr corresponding to new C[n] (2 above), the (effectively same) CXXConstructorExpr has type C [].
In the last line the InitListExpr (3 above) has type C *[0], but according to C++, this is not a valid type (arrays can not have length 0). As above, I would expect C *[n] (though C *[] could be reasonable as well).
What does the field "0x671xxxx" in the AST structure mean?What APIs can I get to this field?(The example program is as follows. Thanks:)
t12.b = 3;
|-BinaryOperator 0x6712768 <line:22:2, col:10> 'int' '='
| |-MemberExpr 0x6712718 <col:2, col:6> 'int' lvalue .b 0x6711e00
| | `-DeclRefExpr 0x67126f8 <col:2> 'struct test1':'struct test1' lvalue Var 0x6712260 't12' 'struct test1':'struct test1'
| `-IntegerLiteral 0x6712748 <col:10> 'int' 3
I'd like to write an analyzer that counts virtual function calls by looking at the C++ AST (output of -ast-dump), but I'm having difficulty determining which function calls are virtual and which are not. Here's an example piece of code:
struct A {
A() {}
virtual int foo() { return 0; }
};
struct B : public A {
B() {}
//virtual int foo() { return 3; }
};
struct C : public B {
C() {}
virtual int foo() { return 1; }
};
int test(C* c) {
return c->foo() + c->B::foo() + c->A::foo();
}
In my initial implementation, I simply check whether or not the function being called is virtual using expr->getMethodDecl()->isVirtual() but (from what I understand) the calls in c->B::foo() and c->A::foo() are not actually virtual so it isn't sufficient to ask if the function being called is virtual.
When I dump the AST from this piece of code, I get the following tree for test:
`-FunctionDecl 0x1d33620 <line:19:1, line:21:1> line:19:5 test 'int (C *)'
|-ParmVarDecl 0x1d33558 <col:10, col:13> col:13 used c 'C *'
`-CompoundStmt 0x1d339f0 <col:16, line:21:1>
`-ReturnStmt 0x1d339e0 <line:20:5, col:47>
`-BinaryOperator 0x1d339c0 <col:12, col:47> 'int' '+'
|-BinaryOperator 0x1d338a8 <col:12, col:33> 'int' '+'
| |-CXXMemberCallExpr 0x1d33778 <col:12, col:19> 'int'
| | `-MemberExpr 0x1d33748 <col:12, col:15> '<bound member function type>' ->foo 0x1d31c80
| | `-ImplicitCastExpr 0x1d33730 <col:12> 'C *' <LValueToRValue>
| | `-DeclRefExpr 0x1d33710 <col:12> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
| `-CXXMemberCallExpr 0x1d33848 <col:23, col:33> 'int'
| `-MemberExpr 0x1d33800 <col:23, col:29> '<bound member function type>' ->foo 0x1d02400
| `-ImplicitCastExpr 0x1d33888 <col:23> 'A *' <UncheckedDerivedToBase (A)>
| `-ImplicitCastExpr 0x1d33868 <col:23> 'B *' <UncheckedDerivedToBase (B)>
| `-ImplicitCastExpr 0x1d337d8 <col:23> 'C *' <LValueToRValue>
| `-DeclRefExpr 0x1d33798 <col:23> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
`-CXXMemberCallExpr 0x1d33978 <col:37, col:47> 'int'
`-MemberExpr 0x1d33930 <col:37, col:43> '<bound member function type>' ->foo 0x1d02400
`-ImplicitCastExpr 0x1d33998 <col:37> 'A *' <UncheckedDerivedToBase (B -> A)>
`-ImplicitCastExpr 0x1d33908 <col:37> 'C *' <LValueToRValue>
`-DeclRefExpr 0x1d338c8 <col:37> 'C *' lvalue ParmVar 0x1d33558 'c' 'C *'
From looking this, it seems that the UncheckedDerivedToBase cast is marking places where the function call is non-virtual. Is this always the case? Should I always consider a call of the form CXXMemberCallExpr (MemberExpr (ImplicitCastExpr<UncheckedDerivedToBase> e))) to be a non-virtual call? Are there other patterns that would indicate non-virtual function calls? Is there any more robust way to determine this fact?
EDIT: Some more investigation suggests that the above hypothesis that UncheckedDerivedToBase is not sufficient. This code:
struct A {
virtual int foo() { return 100; }
};
struct B : public A {
virtual int foo() { return 10; }
};
int test(B* b) {
return b->foo() + b->B::foo();
}
seem to produce exactly the same AST node (at least indistinguishable on the console) for both calls, but the semantics should be different according to the standard if b is actually a derived class, e.g. C above.
The distinguishing factor is hasQualifier on the MemberExpr of the callee object. If hasQualifier is true, then the function call is non-virtual.
As far as I am concerned, the Listener method of antlr4 seems can only directly get the informations of TerminalNodes --- specifically the Lexer Nodes.
However, now I am hoping to put out the information of Parser like this:
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
First of all, I want to figure out how to diirectly get the contents of primitiveType and put out the contents like byte or short without changing it to Lexer(TerminalNode). I've checked the code of aidlParser.java(aidl.g4 is my initial grammar file(
Second, I want to know that if there is a way to know what actually a parser matches. E.g I want to know which regulation(like primitiveType or referencedType ...) of type is used in matching a type in the grammar without having to visit each sub-node(actually the regulations in Lisenter method) of type and see which one contains something.
Here is the entire code of my .g4 file:
grammar aidl;
//parser
//file
file : packageDeclaration* importDeclaration* parcelableDeclaration? interfaceDeclaration? ;
//packageDeclaration
packageDeclaration :'package' packageName ';';
packageName : Identifier
|
packageName '.' Identifier;
// importDeclaration
importDeclaration
: 'import' importName ';'
;
importName : Identifier
|
importName '.' Identifier;
//parcelableDeclaration
parcelableDeclaration : 'parcelable' parcelableName ';' ;
parcelableName : Identifier ;
//interfaceDeclaration
interfaceDeclaration : interfaceTag? 'interface' interfaceName '{' methodsDeclaration+ '}' ;
interfaceTag : 'oneway' ;
interfaceName : Identifier ;
// methodsDeclaration
methodsDeclaration : methodTag? returnType methodName '(' parameters? ')' ';' ;
methodName : Identifier ;
methodTag: 'oneway';
returnType : type ;
// parameters
parameters
: parameter (',' parameter)*
;
parameter
: parameterTag? parameterType parameterName ;
parameterType : type ;
parameterName : Identifier;
parameterTag : 'in' | 'out' | 'inout' ;
// type
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
selfdefineType : Identifier;
arrayType : primitiveType dims
| referencedType dims
;
listType : 'List' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
mapType : 'Map' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
dims
: '[' ']' ( '[' ']')*
;
//Lexer
// Identifier
Identifier
: JavaLetter JavaLetterOrDigit*
;
fragment
JavaLetter
: [a-zA-Z$_] // these are the "java letters" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
JavaLetterOrDigit
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
WS : [ \t\r\n\u000C]+ -> skip
;
I would sincerely be grateful for your help in time!
Once your parse run is over you will get a parse tree. You can walk that tree down to the nodes you are interested in (usually you use a parse tree listener for that and only override the enter/exit* methods that are relevant for your problem). In your enterPrimitveType method you get an EnterPrimitiveTypeContext parameter. Use its getText method to get the text it matched.
For your second question you would do exactly the same, just use the enterType method instead. The EnterTypeContext parameter has members for each alternative in your rule. Check which one is not null to see which actually matched.
In an LLVM-9.x plugin, I'd like to access the initializer (= 42) associated with variable Y in the template specialization T<42> foo:
template<int X>
class T {
//Clang produces different ASTs on this code in C++14 and C++17
//(in 17, there's no initializer for Y (= 42) in the specialized
//class foo below).
static constexpr int Y = X;
};
T<42> foo;
With clang++ -std=c++14 I get an AST (excerpted) that looks like:
> clang++ -std=c++14 -Xclang -ast-dump -fsyntax-only file.cpp
|-ClassTemplateDecl 0x563e38c10980 <test_constexpr.cpp:1:1, line:7:1> line:2:7 T
...
| | `-VarDecl 0x563e38c10c98 <line:6:3, col:28> col:24 Y 'const int' static constexpr cinit
| | `-DeclRefExpr 0x563e38c10d00 <col:28> 'int' NonTypeTemplateParm 0x563e38c10878 'X' 'int'
| `-ClassTemplateSpecializationDecl 0x563e38c10d78 <line:1:1, line:7:1> line:2:7 class T
...
| |-VarDecl 0x563e38c110c8 <line:6:3, col:28> col:24 Y 'const int' static constexpr cinit
| | `-SubstNonTypeTemplateParmExpr 0x563e38c11160 <col:28> 'int' <== MISSING in C++17 BELOW
| | `-IntegerLiteral 0x563e38c11140 <col:28> 'int' 42 <== MISSING in C++17 BELOW
and things work fine (assuming decl is the second VarDecl for Y, I'm able to access 42 by decl->getInit()).
With clang++ -std=c++17, however, I get the new AST:
> clang++ -std=c++17 -Xclang -ast-dump -fsyntax-only file.cpp
|-ClassTemplateDecl 0x56114838ff40 <test_constexpr.cpp:1:1, line:7:1> line:2:7 T
...
| | `-VarDecl 0x561148390258 <line:6:3, col:28> col:24 Y 'const int' static inline constexpr cinit
| | `-DeclRefExpr 0x5611483902c0 <col:28> 'int' NonTypeTemplateParm 0x56114838fe38 'X' 'int'
| `-ClassTemplateSpecializationDecl 0x561148390338 <line:1:1, line:7:1> line:2:7 class T
...
| |-VarDecl 0x561148390688 <line:6:3, col:24> col:24 Y 'const int' static constexpr
in which the initializing expression for Y (-IntegerLiteral 0x563e38c11140 <col:28> 'int' 42) no longer appears. Including a use of Y like int z = Y in the class definition causes the initializer to reappear.
Questions
Should I expect not to see the initializer expression in the AST above in clang++ -std=c++17?
If so, what's the best way to access the initializer for Y in T<42> foo from within an LLVM-9.x plugin?
Thanks! And please let me know if I've left out relevant information -- I'll be happy to update the question to provide it.