How to detect if global variable is a string in LLVM? - clang

In earlier releases of llvm/clang I was able to detect whether global variable was a string by using ie. the GlobalVar->getName() function and checking whether it ends with ".str". I've tried this in the llvm/clang 13 and 14 and it seems that all the names I'm getting are mangled names. Am I missing something?
For example, I have this basic C source code:
//compiled with: clang.exe -std=c99 helloCC.c -o helloCC.exe -mllvm -my_get_strings=1 -flegacy-pass-manager
#include <stdio.h>
char *xmy1 = "hello world";
int main(int argc, char *argv[]) {
printf("%s", xmy1);
return 0;
}
I've manually edited the llvm/clang code too trigger my function as one of the pass (clang executed with "-flegacy-pass-manager" and I've added my pass to PassManagerBuilder.cpp int the void PassManagerBuilder::populateModulePassManager(legacy::PassManagerBase &MPM) function.
Anyway my runOnModule handler executes, iterates over global variables (M.global_being() to M.global_end()) and all the names got by GlobalVar->getName() seems to be mangled:
found global = "??_C#_0M#LACCCNMM#hello?5world?$AA#"
Obviously now my previous theory to detect whether this is a string or not doesn't work. Is there any other better function to detect whether a global is a string / or I am doing something wrong?
I've tried demangling the name, well I can demangle it but I still don't know how to verify whether this is a string or nor. Is there any LLVM function for it?

Well, the main question here is what do you mean by "global variable is string". If you're meaning C-style strings, then you'd just take initializer (which is Constant) and check if this is a C-style string using isCString method (https://llvm.org/doxygen/classllvm_1_1ConstantDataSequential.html#aecff3ad6cfa0e4abfd4fc9484d973e7d)

Related

LLVM, Get first usage of a global variable

I'm new to LLVM and I'm stuck on something that might seem basic.
I'm writing a LLVM pass to apply some transformations to global variables before they are use.
I would like to detect somehow when is the first usage of a global variable to only apply the transformation there, and not in all places where the global variable is used. But it must be the first time it is used otherwise the program crashes.
I have been reading about the AnalysisManager, and I would say that I want something similar to DominatorTree which is used for basic blocks in a function.
So the idea is to get the DominatorTree of a GlobalVariable to get the first time it is used in the code and apply there my transformation.
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
printf("%s\n", MyGlobal);
foo();
}
In the example above, I only want to apply the transformation just before the first printf in the main function
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
foo();
printf("%s\n", MyGlobal);
}
For the example above I would like to apply the transformation inside the foo function.
I want to avoid to create a stub function at the beginning of the program to process all globals before start running (This is what actually Im doing)
Does LLVM provide something that can help me doing this? or what should be the best approach to implement it?

Does the using declaration allow for incomplete types in all cases?

I'm a bit confused about the implications of the using declaration. The keyword implies that a new type is merely declared. This would allow for incomplete types. However, in some cases it is also a definition, no? Compare the following code:
#include <variant>
#include <iostream>
struct box;
using val = std::variant<std::monostate, box, int, char>;
struct box
{
int a;
long b;
double c;
box(std::initializer_list<val>) {
}
};
int main()
{
std::cout << sizeof(val) << std::endl;
}
In this case I'm defining val to be some instantiation of variant. Is this undefined behaviour? If the using-declaration is in fact a declaration and not a definition, incomplete types such as box would be allowed to instantiate the variant type. However, if it is also a definition, it would be UB no?
For the record, both gcc and clang both create "32" as output.
Since you've not included language-lawyer, I'm attempting a non-lawyer answer.
Why should that be UB?
With a using delcaration, you're just providing a synonym for std::variant<whatever>. That doesn't require an instantiation of the object, nor of the class std::variant, pretty much like a function declaration with a parameter of that class doesn't require it:
void f(val); // just fine
The problem would occur as soon as you give to that function a definition (if val is still incomplete because box is still incomplete):
void f(val) {}
But it's enough just to change val to val& for allowing a definition,
void f(val&) {}
because the compiler doesn't need to know anything else of val than its name.
Furthermore, and here I'm really inventing, "incomplete type" means that some definition is lacking at the point it's needed, so I expect you should discover such an issue at compile/link time, and not by being hit by UB. As in, how can the compiler and linker even finish their job succesfully if a definition to do something wasn't found?

How to declare const char* in .cpp file

There are many questions about declaring const string in .h files, this is not my case.
I need string (for serialization purposes if it is important) to use in
My current solution is
// file.cpp
static constexpr const char* const str = "some string key";
void MyClass::serialize()
{
// using str
}
void MyClass::deserialize()
{
// using str
}
Does it have any problems? (i.e. memory leaks, redefinitions, UB, side effects)?
P.S. is using #define KEY "key" could be better here (speed/memory/consistency)?
Since you mentioned C++17, the best way to do this is with:
constexpr std::string_view str = "some string key";
str will be substituted by the compiler to the places where it is used at compile time.
Memory-wise you got rid of storing the str in run-time since it is only available at compile time.
Speed-wise this is also marginally better because less indirections to get the data in runtime.
Consistency-wise it is also even better since constexpr is solely used for expressions that are immutable and available at compile time. Also string_view is solely used for immutable strings so you are using the exact data type needed for you.
constexpr implies the latter const, which in turn implies the static (for a namespace-scope variable). Aside from that redundancy, this is fine.

Using ClangTool to parse a string

The following code successfully sets up a ClangTool for parsing:
std::vector<std::string> source_files;
cl::OptionCategory option_category("options");
CommonOptionsParser options_parser(argc, argv, option_category);
ClangTool tool(options_parser.getCompilations(), source_files);
However, I created argc and argv on my own -- they didn't come from main(). Instead of an input file, is it possible to get the ClangTool to parse the contents of a string? E.g.,
const char *str = "extern int foo;\n";
mapVirtualFile() looked like what I wanted, but when I tried, foo didn't come out of it.
tool.mapVirtualFile(StringRef("faux_file.c"), StringRef(str));
// Then proceed to tool.run(...), etc.
I could create a temporary file in /tmp, but I'd like to simply read from the string.
Update:
Creating a temporary file works. It's kind of hack-ish, but it's stable and consistent. It would be nice to have a way of inputting from a string, but it doesn't seem possible with the existing implementation.

Mimicking typedef in ActionScript?

I'm working on some ActionScript code that needs to juggle a bunch of similar-but-not-interchangeable types (eg, position-in-pixels, internal-position, row-and-column-position) and I'm trying to come up with a naming scheme to minimize the complexity.
Additionally, I don't yet know what the best format for the "internal position" is – using int, uint and Number all have advantages and disadvantages.
Normally I'd solve this with a typedef:
typedef float pixelPos;
typedef int internalPos;
typedef int rowColPos;
Is there any way of getting similar functionality in ActionScript?
If you're using Flex or another command-line compiler to build your project, you could add a pass from an external preprocessor to your build process.
Doesn't get the type-safety, but otherwise appears to do what you want.
I have found an article titled Typedefs in ActionScript 3, which suggests using:
const pixelPos:Class = int;
But that doesn't work – the compiler complains that "Type was not found or was not a compile-time constant: pixelPos" (note: this also happens when I use Object instead of int).
Here is an example of code which doesn't compile:
const pixelPos:Class = int;
function add3(p:pixelPos):void { // <-- type not found on this line
return p + 3;
}
Just make it static const and you can register your own class. Like this:
static const MyClass:Class = int;
And you can't make a variable with this type:
var ert:MyClass; //error
private function ert2():MyClass {}; //error
But you can make an instance:
var ert:* = new MyClass();

Resources