Using ClangTool to parse a string - clang

The following code successfully sets up a ClangTool for parsing:
std::vector<std::string> source_files;
cl::OptionCategory option_category("options");
CommonOptionsParser options_parser(argc, argv, option_category);
ClangTool tool(options_parser.getCompilations(), source_files);
However, I created argc and argv on my own -- they didn't come from main(). Instead of an input file, is it possible to get the ClangTool to parse the contents of a string? E.g.,
const char *str = "extern int foo;\n";
mapVirtualFile() looked like what I wanted, but when I tried, foo didn't come out of it.
tool.mapVirtualFile(StringRef("faux_file.c"), StringRef(str));
// Then proceed to tool.run(...), etc.
I could create a temporary file in /tmp, but I'd like to simply read from the string.
Update:
Creating a temporary file works. It's kind of hack-ish, but it's stable and consistent. It would be nice to have a way of inputting from a string, but it doesn't seem possible with the existing implementation.

Related

How to detect if global variable is a string in LLVM?

In earlier releases of llvm/clang I was able to detect whether global variable was a string by using ie. the GlobalVar->getName() function and checking whether it ends with ".str". I've tried this in the llvm/clang 13 and 14 and it seems that all the names I'm getting are mangled names. Am I missing something?
For example, I have this basic C source code:
//compiled with: clang.exe -std=c99 helloCC.c -o helloCC.exe -mllvm -my_get_strings=1 -flegacy-pass-manager
#include <stdio.h>
char *xmy1 = "hello world";
int main(int argc, char *argv[]) {
printf("%s", xmy1);
return 0;
}
I've manually edited the llvm/clang code too trigger my function as one of the pass (clang executed with "-flegacy-pass-manager" and I've added my pass to PassManagerBuilder.cpp int the void PassManagerBuilder::populateModulePassManager(legacy::PassManagerBase &MPM) function.
Anyway my runOnModule handler executes, iterates over global variables (M.global_being() to M.global_end()) and all the names got by GlobalVar->getName() seems to be mangled:
found global = "??_C#_0M#LACCCNMM#hello?5world?$AA#"
Obviously now my previous theory to detect whether this is a string or not doesn't work. Is there any other better function to detect whether a global is a string / or I am doing something wrong?
I've tried demangling the name, well I can demangle it but I still don't know how to verify whether this is a string or nor. Is there any LLVM function for it?
Well, the main question here is what do you mean by "global variable is string". If you're meaning C-style strings, then you'd just take initializer (which is Constant) and check if this is a C-style string using isCString method (https://llvm.org/doxygen/classllvm_1_1ConstantDataSequential.html#aecff3ad6cfa0e4abfd4fc9484d973e7d)

How to declare const char* in .cpp file

There are many questions about declaring const string in .h files, this is not my case.
I need string (for serialization purposes if it is important) to use in
My current solution is
// file.cpp
static constexpr const char* const str = "some string key";
void MyClass::serialize()
{
// using str
}
void MyClass::deserialize()
{
// using str
}
Does it have any problems? (i.e. memory leaks, redefinitions, UB, side effects)?
P.S. is using #define KEY "key" could be better here (speed/memory/consistency)?
Since you mentioned C++17, the best way to do this is with:
constexpr std::string_view str = "some string key";
str will be substituted by the compiler to the places where it is used at compile time.
Memory-wise you got rid of storing the str in run-time since it is only available at compile time.
Speed-wise this is also marginally better because less indirections to get the data in runtime.
Consistency-wise it is also even better since constexpr is solely used for expressions that are immutable and available at compile time. Also string_view is solely used for immutable strings so you are using the exact data type needed for you.
constexpr implies the latter const, which in turn implies the static (for a namespace-scope variable). Aside from that redundancy, this is fine.

vector<reference_wrapper> .. things going out of scope? how does it work?

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.
I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:
//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;
However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. ..
Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.
Here's a simplified MWE that compiles, crashes, and produces gibberish:
// includes ..
using namespace std;
int main() {
int N = 2; // works correctly for N = 1 without any other changes ... ???
vector<string> strs;
vector<reference_wrapper<string>> refs;
for(int i = 0; i < N; ++i) {
string& sref = ref(strs.emplace_back("a"));
refs.push_back(sref);
}
for (const auto& r : refs) cout << r.get(); // crash & gibberish
}
This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..
The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

Can I get bison to make yytname externally visible?

Bison generates at table of tag names when processing my grammar, something like
static const char *const yytname[] =
{
"$end", "error", "$undefined", "TAG", "SCORE",
...
}
The static keyword keeps yytname from being visible to other parts of the code.
This would normally be harmless, but I want to format my own syntax error messages instead of relying on the ones provided to my yyerror function.
My makefile includes the following rule:
chess1.tab.c: chess.tab.c
sed '/^static const.*yytname/s/static//' $? > $#
This works, but it's not what I'd call elegant.
Is there a better way to get at the table of tag names?
You can export the table using a function which you add to your parser file:
%token-table
%code provides {
const char* const* get_yytname(void);
}
...
%%
...
%%
const char* const* get_yytname(void) { return yytname; }
You probably also want to re-export some of the associated constants.
Alternatively, you could write a function which takes a token number and returns the token name. That does a better job of encapsulation; the existence of the string table and its precise type are implementation details.

Hudson plugin to extract code and publish it to HTML

We have a Hudson build server, and I have a fairly simple need. I'm wondering if a plugin exists for this, or if I should try to write one.
Basically, whenever a build occurs, I'd like to extract some C++ code and use it to produce an HTML table inside of an HTML file. The code specifically is a static array of structs like so:
typedef struct definition {
int val;
const char* name;
const char* desc;
} DEFINITION;
static const DEFINITION def[] =
{
{1, "abc", "def"},
{3, "ghi", "jkl"},
{5, "mno", "pqr"}
};
I'd like to extract the strings from the array and convert them into a two column HTML table. The code to write this would obviously be simple. I'm just new to Hudson and am not sure what the best approach is.
We already use Doxygen, but I'm not sure whether it can perform this job.

Resources