LLVM, Get first usage of a global variable - clang

I'm new to LLVM and I'm stuck on something that might seem basic.
I'm writing a LLVM pass to apply some transformations to global variables before they are use.
I would like to detect somehow when is the first usage of a global variable to only apply the transformation there, and not in all places where the global variable is used. But it must be the first time it is used otherwise the program crashes.
I have been reading about the AnalysisManager, and I would say that I want something similar to DominatorTree which is used for basic blocks in a function.
So the idea is to get the DominatorTree of a GlobalVariable to get the first time it is used in the code and apply there my transformation.
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
printf("%s\n", MyGlobal);
foo();
}
In the example above, I only want to apply the transformation just before the first printf in the main function
Given the following example
int MyGlobal = 30;
void foo()
{
printf("%s\n", MyGlobal);
}
int main()
{
foo();
printf("%s\n", MyGlobal);
}
For the example above I would like to apply the transformation inside the foo function.
I want to avoid to create a stub function at the beginning of the program to process all globals before start running (This is what actually Im doing)
Does LLVM provide something that can help me doing this? or what should be the best approach to implement it?

Related

How can I get a custom python type and avoid importing a python module every time a C function is called

I am writing some functions for a C extension module for python and need to import a module I wrote directly in python for access to a custom python type. I use PyImport_ImportModule() in the body of my C function, then PyObject_GetAttrString() on the module to get the custom python type. This executes every time the C function is called and seems like it's not very efficient and may not be best practice. I'm looking for a way to have access to the python custom type as a PyObject* or PyTypeObject* in my source code for efficiency and I may need the type in more than one C function also.
Right now the function looks something like
static PyObject* foo(PyObject* self, PyObject* args)
{
PyObject* myPythonModule = PyImport_ImportModule("my.python.module");
if (!myPythonModule)
return NULL;
PyObject* myPythonType = PyObject_GetAttrString(myPythonModule, "MyPythonType");
if (!myPythonType) {
Py_DECREF(myPythonModule);
return NULL;
}
/* more code to create and return a MyPythonType instance */
}
To avoid retrieving myPythonType every function call I tried adding a global variable to hold the object at the top of my C file
static PyObject* myPythonType;
and initialized it in the module init function similar to the old function body
PyMODINIT_FUNC
PyInit_mymodule(void)
{
/* more initializing here */
PyObject* myPythonModule = PyImport_ImportModule("my.python.module");
if (!myPythonModule) {
/* clean-up code here */
return NULL;
}
// set the static global variable here
myPythonType = PyObject_GetAttrString(myPythonModule, "MyPythonType");
Py_DECREF(myPythonModule);
if (!myPythonType) {
/* clean-up code here */
return NULL;
/* finish initializing module */
}
which worked, however I am unsure how to Py_DECREF the global variable whenever the module is finished being used. Is there a way to do that or even a better way to solve this whole problem I am overlooking?
First, just calling import each time probably isn't as bad as you think - Python does internally keep a list of imported modules, so the second time you call it on the same module the cost is much lower. So this might be an acceptable solution.
Second, the global variable approach should work, but you're right that it doesn't get cleaned up. This is rarely a problem because modules are rarely unloaded (and most extension modules don't really support it), but it isn't great. It also won't work with isolated sub-interpreters (which isn't much of a concern now, but may become more more popular in future).
The most robust way to do it needs multi-phase initialization of your module. To quickly summarise what you should do:
You should define a module state struct containing this type of information,
Your module spec should contain the size of the module state struct,
You need to initialize this struct within the Py_mod_exec slot.
You need to create an m_free function (and ideally the other GC functions) to correctly decref your state during de-initialization.
Within a global module function, self will be your module object, and so you can get the state with PyModule_GetState(self)

Does the using declaration allow for incomplete types in all cases?

I'm a bit confused about the implications of the using declaration. The keyword implies that a new type is merely declared. This would allow for incomplete types. However, in some cases it is also a definition, no? Compare the following code:
#include <variant>
#include <iostream>
struct box;
using val = std::variant<std::monostate, box, int, char>;
struct box
{
int a;
long b;
double c;
box(std::initializer_list<val>) {
}
};
int main()
{
std::cout << sizeof(val) << std::endl;
}
In this case I'm defining val to be some instantiation of variant. Is this undefined behaviour? If the using-declaration is in fact a declaration and not a definition, incomplete types such as box would be allowed to instantiate the variant type. However, if it is also a definition, it would be UB no?
For the record, both gcc and clang both create "32" as output.
Since you've not included language-lawyer, I'm attempting a non-lawyer answer.
Why should that be UB?
With a using delcaration, you're just providing a synonym for std::variant<whatever>. That doesn't require an instantiation of the object, nor of the class std::variant, pretty much like a function declaration with a parameter of that class doesn't require it:
void f(val); // just fine
The problem would occur as soon as you give to that function a definition (if val is still incomplete because box is still incomplete):
void f(val) {}
But it's enough just to change val to val& for allowing a definition,
void f(val&) {}
because the compiler doesn't need to know anything else of val than its name.
Furthermore, and here I'm really inventing, "incomplete type" means that some definition is lacking at the point it's needed, so I expect you should discover such an issue at compile/link time, and not by being hit by UB. As in, how can the compiler and linker even finish their job succesfully if a definition to do something wasn't found?

How do I run some code only once in Dart?

I wonder if there's a language sugar/SDK utility function in Dart that allows to protect a certain code from running more than once?
E.g.
void onUserLogin() {
...
runOnce(() {
handleInitialMessage();
});
...
}
I know I can add a global or class static boolean flag to check but it would be accessible in other functions of the same scope with a risk of accidental mixup in the future.
In C++ I could e.g. use a local static bool for this.
There is no built-in functionality to prevent code from running more than once. You need some kind of external state to know whether it actually did run.
You can't just remember whether the function itself has been seen before, because you use a function expression ("lambda") here, and every evaluation of that creates a new function object which is not even equal to other function objects created by the same expression.
So, you need something to represent the location of the call.
I guess you could hack up something using stack traces. I will not recommend that (very expensive for very little advantage).
So, I'd recommend something like:
class RunOnce {
bool _hasRun = false;
void call(void Function() function) {
if (_hasRun) return;
// Set after calling if you don't want a throw to count as a run.
_hasRun = true;
function();
}
}
...
static final _runOnce = RunOnce();
void onUserLogin() {
_runOnce(handleInitialMessage);
}
It's still just a static global that can be accidentally reused.

Possible to create Graal native function callable from C without isolate?

I'd like to create a library, written in Java, callable from C, with simple method signatures:
int addThree(int in) {
return in + 3;
}
I know it's possible to do this with GraalVM if you do a little dance and create an Isolate in your C program and pass it in as the first parameter in every function call. There is good sample code here.
The problem is that the system I'm writing for, Postgres, can load C libraries and call functions in them, but I would have to create a wrapper function in C that would wrap every function I wanted to expose. This really limits the value of being able to slap something together in Java and use it in Postgres directly. I'd have to do something like this:
int myPublicAddThreeFunction(int in) {
graal_isolatethread_t *thread = NULL;
if (graal_create_isolate(NULL, NULL, &thread) != 0) {
fprintf(stderr, "error on isolate creation or attach\n");
return 1;
}
return SomeClassName_addThree_big_random_string_here(thread, in);
}
Is there a way, in Java alone, to expose a simple C function? I'm thinking I could create the isolate in a static method that gets loaded once on startup, somehow set it as the current isolate, and have the Java method just use it. Haven't been able to figure it out, though.
Also, it would be real nice not to have to append a big random string to every function name.

SWIG %extend variables

In most cases Im happy by the way SWIG is handling data, however Im facing an issue and cannot find an answer in the documentation.
First of all Im using SWIG with Lua and have the following structures wrapped:
typedef struct
{
%mutable;
float x,y,z;
...
...
} Vector3;
typedef struct
{
...
...
%immutable;
Vector3 gravity;
...
...
%extend
{
void SetGravity(Vector3 gravity)
{
WorldSetGravity($self,gravity);
}
};
} World;
As you can see the gravity XYZ can be affected by calling the SetGravity function, and it work great.
However, in order to be more intuitive and easier to use, I would like to give the opportunity to the user to set component (XY or Z) independently like:
world.gravity.x=-10;
But I need to call in the background SetGravity in order to be able to send the proper value to the physics engine (which is not exposed to Lua).
I would like to know if there’s a way to %extend variables which will allow me to call SetGravity when the world.gravity.xy or z is called?
Or be able to implement my own version of the wrap function for each component like: _wrap_World_gravity_set_x which will allot me to call SetGravity in the background.
Firstly it's worth noting that this problem is harder than simply making a "virtual" member variable using %extend that automatically calls an extra function when someone modifies it. This is because you want the fact that it's a member of another class to alter the behaviour.
I can see several fundamental approaches you could take to get this behaviour:
Inject some extra code in the target scripting language to hook the set
Inject some extra stuff in the SWIG interface to transparently convert the Vector3 inside World to something that still looks and feels the same, but has the behaviour you want under the hood.
Inject some extra code into the memberin typemap for Vector3 that checks the context it's being called from and modifies the behaviour accordingly.
Of these #2 is my preferred solution because #1 is language specific (and I don't know Lua well enough to do it!) and #3 feels dirty from a software engineering perspective.
To implement #2 I did the following:
%module test
%{
#include "test.h"
%}
typedef struct
{
%mutable;
float x,y,z;
} Vector3;
%nodefaultctor Vector3Gravity;
%{
// Inside C this is just a typedef!
typedef Vector3 Vector3Gravity;
// But we have magic for sets/gets now:
#define MEMBER_VAR(ct,vt,rt,n) \
SWIGINTERN void ct##_##n##_set(ct *self, const vt val) { \
self->n = val; \
/* Need to find a way to lookup world here */ \
WorldSetGravity(world, self); \
} \
SWIGINTERN vt ct##_##n##_get(const ct *self) { return self->n; }
MEMBER_VAR(Vector3Gravity, float, Vector3, x)
MEMBER_VAR(Vector3Gravity, float, Vector3, y)
MEMBER_VAR(Vector3Gravity, float, Vector3, z)
%}
// Inside SWIG Vector3Gravity is a distinct type:
typedef struct
{
%mutable;
%extend {
float x,y,z;
}
} Vector3Gravity;
%typemap(memberin,noblock=1) Vector3Gravity gravity %{
$1 = *((const Vector3*)$input);
WorldSetGravity($self, $1); // This gets expanded to automatically make this call
%}
typedef struct
{
// This is a blatant lie!
Vector3Gravity gravity;
} World;
Essentially we're lying and claiming that the gravity member of world is a "special" type, whereas really it's just a Vector3. Our special type has two distinct features. Firstly sets/gets on its members are implemented a C code by us. Secondly when we set this member we automatically make an extra call rather than just pass the values in.
There are two things possibly missing from this example that you might want:
Transparent conversion from Vector3Gravity to Vector3. (As it stands anything other than the set for gravity will refuse to accept Vector3Gravity instances). You can make that transparent by using the overload resolution mechanisms of SWIG/Lua if needed.
Inside the setters for Vector3Gravity we don't know which world this gravity belongs to.
We could solve that in several ways, the simplest being to implicitly set a static pointer every time we create a Vector3Gravity. This would make sense if there only ever is one world.
Another approach would be to use a global map of Vector3Gravity instances to worlds that gets maintained automatically.
Finally, instead of using a typedef for the Vector3Gravity type we could make it a real distinct type, with a compatible layout, but add a pointer to the World it came from. That's more work though.

Resources