Build a Path of LLVM basic block - path

I have to create a LLVM analysis pass for an exam project which consist of printing the independent path of a function using the baseline method.
Currently, I am struggling on how can I build the baseline path traversing the various basic block. Furthermore, I know that basic block are already organized in a CFG but checking the documentation I can't find any useful method to build a linked list of basic block representing a path from the entry point to the end point of a function. I am not an expert with the LLVM environment and I want to ask if someone with more knowledge knows how to build this kind of path.
Thank you everyone.
Update: i followed the advice of the answer to this post and i made this code for building a path:
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/CFG.h"
#include <set>
#include <list>
using namespace llvm;
using namespace std;
void Build_Baseline_path(BasicBlock *Start, set<BasicBlock *> Explored, list<BasicBlock *> Decision_points, list<BasicBlock *>Path) {
for (BasicBlock *Successor : successors(Start)) {
Instruction *Teriminator = Successor->getTerminator();
const char *Instruction_string = Teriminator->getOpcodeName();
if (Instruction_string == "br" || Instruction_string == "switch") {
errs() << "Decision point found" << "\n";
Decision_points.push_back(Successor);
}
if (Instruction_string == "ret") {
if (Explored.find(Successor) == Explored.end()) {
errs() << "Added node to the baseline path" << "\n";
Path.push_back(Successor);
return;
}
return;
}
if (Explored.find(Successor) == Explored.end()) {
Path.push_back(Successor);
Build_Baseline_path(Successor,Explored,Decision_points,Path);
}
}
}
This is a code that wrote in another file .cpp and i include it in my Function Pass, but when i run the pass with this function, everything is blocked and seems like that my pc is crashing when i run this pass. I tried to comment the call of this function in the pass to see if the problem is somewhere else, but everything works fine so the problem is in this code, what is wrong in this code? I am sorry but i am a novice with c++, i can't figure out how to solve this.

First off, there isn't a single end point. At least four kinds of instructions may be end points: return, unreachable and in some cases call/invoke (when the called function throws and the exception isn't caught in this function).
Accordingly, there are many possible paths. The number of possible paths is not even sure to be countable, depending on how you treat loops.
If you regard loops in a simplistic way and ignore exceptions, then it's simple to construct a list of paths. There exists an iterator called successors() which you can use as in this answer. You can use successors() in a recursive function to process successors, and when you reach a return or something like that, you act on the path you've built.

Related

Possible to create Graal native function callable from C without isolate?

I'd like to create a library, written in Java, callable from C, with simple method signatures:
int addThree(int in) {
return in + 3;
}
I know it's possible to do this with GraalVM if you do a little dance and create an Isolate in your C program and pass it in as the first parameter in every function call. There is good sample code here.
The problem is that the system I'm writing for, Postgres, can load C libraries and call functions in them, but I would have to create a wrapper function in C that would wrap every function I wanted to expose. This really limits the value of being able to slap something together in Java and use it in Postgres directly. I'd have to do something like this:
int myPublicAddThreeFunction(int in) {
graal_isolatethread_t *thread = NULL;
if (graal_create_isolate(NULL, NULL, &thread) != 0) {
fprintf(stderr, "error on isolate creation or attach\n");
return 1;
}
return SomeClassName_addThree_big_random_string_here(thread, in);
}
Is there a way, in Java alone, to expose a simple C function? I'm thinking I could create the isolate in a static method that gets loaded once on startup, somehow set it as the current isolate, and have the Java method just use it. Haven't been able to figure it out, though.
Also, it would be real nice not to have to append a big random string to every function name.

vector<reference_wrapper> .. things going out of scope? how does it work?

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.
I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:
//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;
However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. ..
Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.
Here's a simplified MWE that compiles, crashes, and produces gibberish:
// includes ..
using namespace std;
int main() {
int N = 2; // works correctly for N = 1 without any other changes ... ???
vector<string> strs;
vector<reference_wrapper<string>> refs;
for(int i = 0; i < N; ++i) {
string& sref = ref(strs.emplace_back("a"));
refs.push_back(sref);
}
for (const auto& r : refs) cout << r.get(); // crash & gibberish
}
This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..
The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

How to get the size of a user defined struct? (sizeof)

I've got a structure with C representation:
struct Scard_IO_Request {
proto: u32,
pciLength: u32
}
when I want to ask the sizeof (like in C sizeof()) using:
mem::sizeof<Scard_IO_Request>();
I get compilation error:
"error: `sizeof` is a reserved keyword"
Why can't I use this sizeof function like in C? Is there an alternative?
For two reasons:
There is no such function as "sizeof", so the compiler is going to have a rather difficult time calling it.
That's not how you invoke generic functions.
If you check the documentation for mem::size_of (which you can find even if you search for "sizeof"), you will see that it includes a runnable example which shows you how to call it. For posterity, the example in question is:
fn main() {
use std::mem;
assert_eq!(4, mem::size_of::<i32>());
}
In your specific case, you'd get the size of that structure using
mem::size_of::<Scard_IO_Request>()

Requiring a LuaJIT module in a subdir is overwriting a module of the same name in the parent dir

I have a file setup like this:
main.lua (requires 'mydir.b' and then 'b')
b.lua
mydir/
b.so (LuaJIT C module)
From main, I do this:
function print_loaded()
for k, v in pairs(package.loaded) do print(k, v) end
end
print_loaded()
require 'mydir.b'
print_loaded()
-- This would now include 'mydir.b' instead of 'b':
local b = require 'b'
The outputs of the prints show that my call to require 'mydir.b' is setting the return value as the value of package.loaded['b'] as well as the expected package.loaded['mydir.b']. I wanted to have package.loaded['b'] left unset so that I can later require 'b' and not end up with the (in my opinion incorrectly) cached value from mydir.b.
My question is: What's a good way to deal with this?
In my case, I want to be able to copy around mydir as a subdir of any of my LuaJIT projects, and not have to worry about mydir.whatever polluting the module namespace by destroying any later requires of whatever at the parent directory level.
In anticipation of people saying, "just rename your modules!" Yes. I can do that. But I'd love to know if there's a better solution that allows me to simply not have to worry about the name collisions at all.
The problem was that I was calling luaL_register incorrectly from within my b.so's source file (b.c).
Here is the bad code that caused the problem:
static const struct luaL_reg b[] = {
/* set up a list of function pointers here */
};
int luaopen_mydir_b(lua_State *L) {
luaL_register(L, "b", b); // <-- PROBLEM HERE (see below)
return 1; // 1 = # Lua-visible return values on the stack.
}
The problem with the highlighted line is that it will specifically set package.loaded['b'] to have the return value of this module when it's loaded. This can be fixed by replacing the line with this:
luaL_register(L, "mydir.b", b);
which will set package.loaded['mydir.b'] instead, and thus leave room for later use of a module with the same name (without the mydir prefix).
I didn't realize this until long after I asked this question, when I finally got around to reading the official docs for luaL_register for Lua 5.1, which is the version LuaJIT complies with.

How can I load an unnamed function in Lua?

I want users of my C++ application to be able to provide anonymous functions to perform small chunks of work.
Small fragments like this would be ideal.
function(arg) return arg*5 end
Now I'd like to be able to write something as simple as this for my C code,
// Push the function onto the lua stack
lua_xxx(L, "function(arg) return arg*5 end" )
// Store it away for later
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
However I dont think lua_loadstring will do "the right thing".
Am I left with what feels to me like a horrible hack?
void push_lua_function_from_string( lua_State * L, std::string code )
{
// Wrap our string so that we can get something useful for luaL_loadstring
std::string wrapped_code = "return "+code;
luaL_loadstring(L, wrapped_code.c_str());
lua_pcall( L, 0, 1, 0 );
}
push_lua_function_from_string(L, "function(arg) return arg*5 end" );
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
Is there a better solution?
If you need access to parameters, the way you have written is correct. lua_loadstring returns a function that represents the chunk/code you are compiling. If you want to actually get a function back from the code, you have to return it. I also do this (in Lua) for little "expression evaluators", and I don't consider it a "horrible hack" :)
If you only need some callbacks, without any parameters, you can directly write the code and use the function returned by lua_tostring. You can even pass parameters to this chunk, it will be accessible as the ... expression. Then you can get the parameters as:
local arg1, arg2 = ...
-- rest of code
You decide what is better for you - "ugly code" inside your library codebase, or "ugly code" in your Lua functions.
Have a look at my ae. It caches functions from expressions so you can simply say ae_eval("a*x^2+b*x+c") and it'll only compile it once.

Resources