Parse data structures clang/LLVM

Parse data structures clang/LLVM - clang

I was wondering what is the best solution in order to parse and obtain data structures from C sources files. Suppose that I have:
typedef int M_Int;
typedef float* P_Float;
typedef struct Foo {
M_Int a;
P_Float p_f;
} Foo;
What is the best way to unfold the data structures in order to get the primitives of both variables a and p_f of struct Foo?
Parsing the AST, for very simple examples, could be the best way, but when the code becomes more complex, maybe it's better to work in a more low-level way with IR code?

You can use llvm debug info to grab the information you need. If you compile the C code with -g option, it generates debug info which contains all the information. Understanding llvm debuginfo is tricky mostly because there is not much documentation about their structure and how to access them. Here are some links:
1) http://llvm.org/docs/SourceLevelDebugging.html
2) Here is a link to a project that I am working on which uses debug info. This might not be too useful as there is not much documentation but it might be useful to see the usage of the debuginfo classes. We are trying to get field names for all pointer parameters (including field names in case of structure parameter) of a C function. All of the code related to debuginfo access is in this file: https://github.com/jiten-thakkar/DataStructureAnalysis/blob/dsa_llvm3.8/lib/dsaGenerator/DSAGenerator.cpp

To find the underlying types, the AST is a good level to work at. Clang can automate and scale this process with AST Matchers and Callbacks, used in conjunction with libtooling. For example, the AST matcher combination
fieldDecl( hasType( tyoedefType().bind("typedef") ) ).bind("field")
will match fields in C structs that are declared with a typedef instead of a built-in type. The bind() calls make AST nodes accessible to a Callback. Here's a Callback whose run() method gets the underlying type of the field declaration:
virtual void run(clang::ast_matchers::MatchFinder::MatchResult const & result) override
{
using namespace clang;
FieldDecl * f_decl =
const_cast<FieldDecl *>(result.Nodes.getNodeAs<FieldDecl>("field"));
TypedefType * tt = const_cast<TypedefType *>(
result.Nodes.getNodeAs<TypedefType>("typedef"));
if(f_decl && tt) {
QualType ut = tt->getDecl()->getUnderlyingType();
TypedefNameDecl * tnd = tt->getDecl();
std::string struct_name = f_decl->getParent()->getNameAsString();
std::string fld_name = f_decl->getNameAsString();
std::string ut_name = ut.getAsString();
std::string tnd_name = tnd->getNameAsString();
std::cout << "Struct '" << struct_name << "' declares field '"
<< fld_name << " with typedef name = '" << tnd_name << "'"
<< ", underlying type = '" << ut_name << "'" << std::endl;
}
else {
// error handling
}
return;
} // run
Once this is put into a Clang Tool and built, running
typedef-report Foo.h -- # Note two dashes
produces
Struct 'Foo' declares field 'a' with typedef name = 'M_Int', underlying type = 'int'
Struct 'Foo' declares field 'p_f' with typedef name = 'P_Float', underlying type = 'float *'
I put up a full working example app in a Code Analysis and Refactoring Examples with Clang Tools project (see apps/TypedefFinder.cc).

Related

Does the using declaration allow for incomplete types in all cases?

I'm a bit confused about the implications of the using declaration. The keyword implies that a new type is merely declared. This would allow for incomplete types. However, in some cases it is also a definition, no? Compare the following code:
#include <variant>
#include <iostream>
struct box;
using val = std::variant<std::monostate, box, int, char>;
struct box
{
int a;
long b;
double c;
box(std::initializer_list<val>) {
}
};
int main()
{
std::cout << sizeof(val) << std::endl;
}
In this case I'm defining val to be some instantiation of variant. Is this undefined behaviour? If the using-declaration is in fact a declaration and not a definition, incomplete types such as box would be allowed to instantiate the variant type. However, if it is also a definition, it would be UB no?
For the record, both gcc and clang both create "32" as output.

Since you've not included language-lawyer, I'm attempting a non-lawyer answer.
Why should that be UB?
With a using delcaration, you're just providing a synonym for std::variant<whatever>. That doesn't require an instantiation of the object, nor of the class std::variant, pretty much like a function declaration with a parameter of that class doesn't require it:
void f(val); // just fine
The problem would occur as soon as you give to that function a definition (if val is still incomplete because box is still incomplete):
void f(val) {}
But it's enough just to change val to val& for allowing a definition,
void f(val&) {}
because the compiler doesn't need to know anything else of val than its name.
Furthermore, and here I'm really inventing, "incomplete type" means that some definition is lacking at the point it's needed, so I expect you should discover such an issue at compile/link time, and not by being hit by UB. As in, how can the compiler and linker even finish their job succesfully if a definition to do something wasn't found?

How to declare const char* in .cpp file

There are many questions about declaring const string in .h files, this is not my case.
I need string (for serialization purposes if it is important) to use in
My current solution is
// file.cpp
static constexpr const char* const str = "some string key";
void MyClass::serialize()
{
// using str
}
void MyClass::deserialize()
{
// using str
}
Does it have any problems? (i.e. memory leaks, redefinitions, UB, side effects)?
P.S. is using #define KEY "key" could be better here (speed/memory/consistency)?

Since you mentioned C++17, the best way to do this is with:
constexpr std::string_view str = "some string key";
str will be substituted by the compiler to the places where it is used at compile time.
Memory-wise you got rid of storing the str in run-time since it is only available at compile time.
Speed-wise this is also marginally better because less indirections to get the data in runtime.
Consistency-wise it is also even better since constexpr is solely used for expressions that are immutable and available at compile time. Also string_view is solely used for immutable strings so you are using the exact data type needed for you.

constexpr implies the latter const, which in turn implies the static (for a namespace-scope variable). Aside from that redundancy, this is fine.

Can I get bison to make yytname externally visible?

Bison generates at table of tag names when processing my grammar, something like
static const char *const yytname[] =
{
"$end", "error", "$undefined", "TAG", "SCORE",
...
}
The static keyword keeps yytname from being visible to other parts of the code.
This would normally be harmless, but I want to format my own syntax error messages instead of relying on the ones provided to my yyerror function.
My makefile includes the following rule:
chess1.tab.c: chess.tab.c
sed '/^static const.*yytname/s/static//' $? > $#
This works, but it's not what I'd call elegant.
Is there a better way to get at the table of tag names?

You can export the table using a function which you add to your parser file:
%token-table
%code provides {
const char* const* get_yytname(void);
}
...
%%
...
%%
const char* const* get_yytname(void) { return yytname; }
You probably also want to re-export some of the associated constants.
Alternatively, you could write a function which takes a token number and returns the token name. That does a better job of encapsulation; the existence of the string table and its precise type are implementation details.

Convert template parameter into comma-separated list of template parameters

Apologies if the title is misleading or if this question has been answered before.
I'm working with Eigen's Tensor module, particularly the Eigen::TensorFixedSize class as I know the shape at compile time.
Essentially, because this is a Lorentz problem, a rank-2 tensor would go like,
Eigen::TensorFixedSize<double, Eigen::Sizes<4,4>> t;
a rank-3 tensor,
Eigen::TensorFixedSize<double, Eigen::Sizes<4,4,4>> t;
and so on.
I'd like to write a class that is able to initialise a tensor depending on the rank. In pseudo-code,
template<typename RANK>
class Foo
{
public:
...
private:
Eigen::TensorFixedSize<double, Eigen::Sizes<4,4,4,...,RANK times>> _t;
}
somehow converting the template parameter from
<2> --> <4,4>
<3> --> <4,4,4>
up to an arbitrary unsigned int in <N>.
Would this be possible to do?

Yup.
template <class RankIdx>
struct TensorForRank;
template <std::size_t... RankIdx>
struct TensorForRank<std::index_sequence<RankIdx...>> {
using type = Eigen::TensorFixedSize<double, Eigen::Sizes<(void(RankIdx), 4)...>>;
};
template <std::size_t Rank>
using TensorForRank_t = typename TensorForRank<std::make_index_sequence<Rank>>::type;
Use as:
template<std::size_t Rank>
class Foo
{
// ...
private:
TensorForRank_t<Rank> _t;
};
See it live on Wandbox (with a placeholder test<...> template as Eigen is not available)

Quentin's answer is very good, and what I'd go with.
The only downside is the "useless" generation of an index sequence [0, 1, 2, ...] whose values we ignore, and substitute for our own.
If we want to directly create the repeated values, we can write our own generator code (which is quite a bit more verbose):
Start with creating a type that can hold a number of std::size_t values by aliasing a std::integer_sequence:
template<std::size_t... vals>
using value_sequence = std::integer_sequence<std::size_t, vals...>;
The goal is to ultimately create a value_sequence<4, 4, 4> and then instantiate an Eigen::Sizes using those 4s.
The next thing we need to be able to do is concatenate two sequences, because we're going to build it up like so:
concat(value_sequence<4>, value_sequence<4>) --> value_sequence<4, 4>
We can do this via a stub method that accepts two value_sequence types and returns the concatenated result. Note that we do not ever write a definition for this method; we're simply taking advantage of the type system to write less code than a template specialization would take:
template<std::size_t... lhs, std::size_t... rhs>
constexpr auto concat(value_sequence<lhs...>, value_sequence<rhs...>) -> value_sequence<lhs..., rhs...>;
At this point we have enough machinery to create a value_sequence<4,4,4>, so now we need a way to indicate the value we wish to use (4) and the number of times to repeat it (3) to produce it:
template<std::size_t value, std::size_t num_repeats>
struct repeated_value
{
using left_sequence = value_sequence<value>;
using right_sequence = typename repeated_value<value, num_repeats-1>::type;
using type = decltype(concat(left_sequence{}, right_sequence{}));
};
repeated_value<4, 3>::type produces a value_sequence<4, 4, 4>.
Since repeated_value<...>::type is recursive, we need to provide a base case via partial specialization:
template<std::size_t value>
struct repeated_value<value, 1>
{
using type = value_sequence<value>;
};
Great. All that's left is for us to receive an Eigen::Sizes class and a value_sequence<4, 4, 4> type, and produce Eigen::Sizes<4, 4, 4>.
We can do this with partial template specialization again:
template<template<std::size_t...> class T, class...>
struct InstantiateWithRepeatedVals;
template<template<std::size_t...> class T, std::size_t... vals>
struct InstantiateWithRepeatedVals<T, value_sequence<vals...>>
{
using type = T<vals...>;
};
That it! Throw in a few helpers to make using it easier, and we're done:
template<std::size_t value, std::size_t num_repeats>
using repeated_value_t = typename repeated_value<value, num_repeats>::type;
template<template<std::size_t...> class T, std::size_t Value, std::size_t N>
using InstantiateWithRepeatedVals_t = typename InstantiateWithRepeatedVals<T, repeated_value_t<Value, N>>::type;
Now we can use it like so:
using my_type = InstantiateWithRepeatedVals_t<EigenSizes, 4, 3>;
static_assert(std::is_same_v<my_type, EigenSizes<4, 4, 4>>);
Live Demo

Is it possible LibTooling doesn't change headers?

I have an LibTooling (TimeFlag), which is used to add an flag for every forstmt/whilestmt. And I use ./TimeFlag lalala.cpp -- to insert flags in lalala.cpp
Unfortunately, this tool also will change the headers, even system library.
So is there some ways letting LibTooling just handle the input file?

Here are two possibilities: if using a RecursiveASTVisitor, one could use the SourceManager to determine if the location of the statement or declaration is in the main expansion file:
clang::SourceManager &sm(astContext->getSourceManager());
bool const inMainFile(
sm.isInMainFile( sm.getExpansionLoc( stmt->getLocStart())));
if(inMainFile){
/* process decl or stmt */
}
else{
std::cout << "'" << stmt->getNameAsString() << "' is not in main file\n";
}
There are several similar methods in SourceManager, such as isInSystemHeader to assist with this task.
If you are using AST matchers, you can use isExpansionInMainFile to narrow which nodes it matches:
auto matcher = forStmt( isExpansionInMainFile());
There is a similar matcher, isExpansionInSystemHeader.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parse data structures clang/LLVM - clang

Related

Does the using declaration allow for incomplete types in all cases?

How to declare const char* in .cpp file

Can I get bison to make yytname externally visible?

Convert template parameter into comma-separated list of template parameters

Is it possible LibTooling doesn't change headers?

Categories

Resources