Find loops in LLVM bytecode - clang

I want to find simple loops in LLVM bytecode, and extract the basic
information of the loop.
For example:
for (i=0; i<1000; i++)
sum += i;
I want to extract the bound [0, 1000), the loop variable "i" and the
loop body (sum += i).
What should I do?
I read the LLVM API document, and find some useful classes like "Loop",
"LoopInfo".
But I do not know how to use them in detail.
Could you please give me some help? A detailed usage may be more helpful.

If you do not want to use the pass manager, you might need to call the Analyze method in the llvm::LoopInfoBase class on each function in the IR (assuming you are using LLVM-3.4). However, the Analyze method takes the DominatorTree of each function as input, which you have to generate at first. Following codes are what I tested with LLVM-3.4 (assuming you have read the IR file and converted it into a Module* named as module):
for(llvm::Module::iterator func = module->begin(), y=module->end(); func!=y; func++){
//get the dominatortree of the current function
llvm::DominatorTree* DT = new llvm::DominatorTree();
DT->DT->recalculate(*func);
//generate the LoopInfoBase for the current function
llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>* KLoop = new llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>();
KLoop->releaseMemory();
KLoop->Analyze(DT->getBase());
}
Basically, with KLoop generated, you get all kinds of LOOP information in the IR level. You can refer APIs in the LoopInfoBase class for details. By the way, you might want to add following headers:
"llvm/Analysis/LoopInfo.h" "llvm/Analysis/Dominators.h".

Once you get to the LLVM IR level, the information you request may no longer be accurate. For example, clang may have transformed your code so that i goes from -1000 up to 0 instead. Or it may have optimised "i" out entirely, so that there is no explicit induction variable. If you really need to extract the information exactly as it says at face value in the C code, then you need to look at clang, not LLVM IR. Otherwise, the best you can do is to calculate a loop trip count, in which case, have a look at the ScalarEvolution pass.
Check the PowerPC hardware loops transformation pass, which demonstrates the trip count calculation fairly well: http://llvm.org/docs/doxygen/html/PPCCTRLoops_8cpp_source.html
The code is fairly heavy, but should be followable. The interesting function is PPCCTRLoops::convertToCTRLoop. If you have any further questions about that, I can try to answer them.

LLVM is just a library. You won't find AST nodes there.
I suggest to have a look at Clang, which is a compiler built on top of LLVM.
Maybe this is what you're looking for?

Much like Matteo said, in order for LLVM to be able to recognize the loop variable and condition, the file need to be in LLVM IR. The question says you have it in LLVM bytecode, but since LLVM IR is written in SSA form, talking about "loop variables" isn't really true. I'm sure if you describe what you're trying to do, and what type of result you expect we can be of further help.
Some code to help you get started:
virtual void getAnalysisUsage(AnalysisUsage &AU) const{
AU.addRequired<LoopInfo>();
}
bool runOnLoop(Loop* L, LPPassManager&){
BasicBlock* h = L->getHeader();
if (BranchInst *bi = dyn_cast<BranchInst>(h->getTerminator())) {
Value *loopCond = bi->getCondition();
}
return false;
}
This code snippet is from inside a regular LLVM pass.

Just an update on Junxzm answer, some references, pointers, and methods have changed in LLVM 3.5.
for(llvm::Module::iterator f = m->begin(), fe=m->end(); f!=fe; ++f){
llvm::DominatorTree DT = llvm::DominatorTree();
DT.recalculate(*f);
llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>* LoopInfo = new llvm::LoopInfoBase<llvm::BasicBlock, llvm::Loop>();
LoopInfo->releaseMemory();
LoopInfo->Analyze(DT);
}

Related

Lua / Coroutines How to measures how much memory standard functions take

i'm currently trying to find a way to return a static value which would represent how much memory standart a function takes or its time of execution (as a static thread), I thought about using coroutines, however I cannot make any working prototypes, thanks for help in advance ! (:
The Lua function collectgarbage with the string "count" as an argument returns a number that reflects the amount of memory currently in use by the interpreter. Here is a link to an example and more information; I will reproduce the example here:
function memuse()
local i = math.modf(collectgarbage("count") * 1024)
return i
end
This function returns the amount of memory, in kilobytes, currently in use by Lua.
As for time, the simplest way is to call os.time(), which returns the current system time. Note, though, that this only returns the number of seconds to the nearest whole number. If you need greater precision, there are a few options: one, place a system call with io.popen to retrieve the current system time, which includes the non-integer portion; or two, implement some time-related functions in C/C++ and call them from Lua. I have used both options and the second one produces excellent accuracy, but for the sake of simplicity I will just show the first one.
-- Function called 'tick' to retrieve the current OS time.
function tick()
local fil = assert(io.popen("date +%s.%N"))
local str = fil:read("*all")
return tonumber(str)
end
Lua file handles--one of which is produced by the call to io.popen--have their own destructors so they need not be explicitly closed; however, you may want to call fil:close() for the sake of garbage collection and avoiding any open file-related errors.
If you want to pursue the second, more complicated option, I suggest creating a timer class in C++ that makes use of the chrono library to retrieve the system time.
I am not sure that these two functions are of relevance to you, but I hope they help.

Can I get a solution using "timeout" when using Optimize.minimize()?

I'm trying to minimize a variable, but z3 takes to long in order to give me a solution.
And I would like to know if it's possible to get a solution when timeout gets triggered.
If yes how can i do that?
Thx in advance!
If by "solution" you mean the latest approximation of the optimal value, then you may be able to retrieve it, provided that the optimization algorithm being used finds any intermediate solution along the way. (Some optimization algorithms --like, e.g., maxres-- don't find any intermediate solution).
Example:
import z3
o = z3.Optimize()
o.add(...very hard problem...)
cf = z3.Int('cf')
o.add(cf = ...)
obj = o.minimize(cf)
o.set(timeout=...)
res = o.check()
print(res)
print(obj.upper())
Even when res = unknown because of a timeout, the objective instance contains the latest approximation of the optimum value found by z3 before the timeout.
Unfortunately, I am not sure whether it is also possible to retrieve the corresponding sub-optimal model with o.model() (or any other method).
For OptiMathSAT, I show how to retrieve the latest approximation of the optimum value and the corresponding model in the unit-test timeout.py.

How to use LogiCORE DSP48 Macro?

I want to learn how to use LogiCORE DSP48 Macro. I'm reading the Xilinx documentation but I cannot understand well how to start my first design with DSP48 Macro. Can anyone help me to make a simple design to get a better understanding of this IP core please?
Thanks in advance!
In many cases you would use DSP48 by writing Verilog/VHDL expressions containing add, subtract, and multiply.
x = a * b + c
A problem with the above expression is that the multiplication and addition take place in a single cycle. You can run the expression at a higher frequency if the operation could be pipelined. Vivado can sometimes retime these expressions across registers in order to make use of the DSP48 pipeline registers.
However, I understand wanting to use the DSP48 directly. You instantiate DSP48's just like other RTL modules. The ports, parameters, and behaviors are described in the DSP Slice User Guide for the FPGA logic that you are using.
wire [47:0] c;
wire [24:0] a;
wire [17:0] b;
DSP48E1#() dsp(
.a(a),
.b(b),
.c(c),
.p(x),
.opmode(5),
.alumode(0)
);
This instance is copied from one of my inner-product implementations. It is fully pipelined because I was aiming for 500MHz operation. Only achieved 400MHz due to other combinational paths.
For Xilinx 7 Series:
DSP48E1 Slice User Guide
For Xilinx Ultrascale:
DSP48E2 Slice User Guide

trying to rewrite this so it doesnt violate "prinicples discussed in Code Complete 2nd edition

function profit(){
int totalSales=0;
for (int i=0; i<12;i++) // computer yearly sales
totalSales+=montlysales[i];
return get_profit_from_sales(totalsales);
}
So i've already determined that the 12 in the for loop should be a constant instead of just using an integer and that the montlysales should be passed as a parameter into the function so then a check can be run to see if the length of sales is equal to the integer value of months which is also twelve.
I'm not sure if those are all the violations of the princples cause. I feel the last line
return get_profit_from_sales(totalsales)
is wrong and its really bothering me cause I can't seem to figure out why it is in fact bothering me and I think I might have skipped something else.
can anyone help me verify?
Summary - you should refactor out the call to another function and make this function so that it is pure and does only one thing, reducing complexity, and improving your ability to reason abstractly about your program and its correctness.
Your spidey sense is tingling and you should trust it - you are correct, but what is wrong is subtle.
Routines are best when they do one thing, and one thing only. So purity of vision is important in the prime imperative, management of complexity -- it allows our brains to be able to juggle more things because they are simpler. That is, you can just look at the function and know what it does, and you don't have to say, "it totals the sales, but it also calls another function at the end", which sort of clouds its "mission".
This is also part of functional programming and where I feel that languages have to adopt to try to implement the prime imperative spoken of in Code Complete. Functional programming has as one of its tenets, "no side effects", which is similar to "one mission" or "one purpose". What I've done to your code can also be seen as making it more functional, just inputs and outputs and nothing in or out via any other path.
Note also that function get_profit() reads like pseudocode, making it somewhat self-documenting, and you don't have to delve into any of the functions to understand what the function does or how it does it (ideally).
So, here is my idea of what I explained above (loosely coded, and not checked!).
function get_total_sales(int montlysales[]){
int totalSales=0;
for (int i=0; i<12;i++) // computer yearly sales
totalSales+=montlysales[i];
return totalSales;
}
function calc_profit(int all_sales, int all_commissions, int all_losses)(){
// your calculations here
int profit = all_sales - all_commissions - all_losses; // ... etc.
return profit;
}
function get_profit(){
int montlysales[] = read_from_disk();
int total_sales = get_total_sales(montlysales);
int tot_commissions = get_tot_commissions();
int tot_losses = get_tot_losses();
int profit_made = calc_profit(total_sales, tot_commissions, tot_losses);
return profit_made;
}
I read Code Complete about once a year, as coding is truly subtle at times, because it is so multidimensional. This has been very helpful to me. Regards - Stephen

Compile openmp into pthreads C code

I understand that OpenMP is in fact just a set of macros which is compiled into pthreads. Is there a way of seeing the pthread code before the rest of the compilation occurs? I am using GCC to compile.
First, OpenMP is not a simple set of macros. It may be seen a simple transformation into pthread-like code, but OpenMP does require more than that including runtime support.
Back to your question, at least, in GCC, you can't see pthreaded code because GCC's OpenMP implementation is done in the compiler back-end (or middle-end). Transformation is done in IR(intermediate representation) level. So, from the viewpoint of programmers, it's not easy to see how the code is actually transformed.
However, there are some references.
(1) An Intel engineer provided a great overview of the implementation of OpenMP in Intel C/C++ compiler:
http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-1/226300148
http://www.drdobbs.com/parallel/how-do-openmp-compilers-work-part-2/226300277
(2) You may take a look at the implementation of GCC's OpenMP:
https://github.com/mirrors/gcc/tree/master/libgomp
See libgomp.h does use pthread, and loop.c contains the implementation of parallel-loop construct.
OpenMP is a set of compiler directives, not macros. In C/C++ those directives are implemented with the #pragma extension mechanism while in Fortran they are implemented as specially formatted comments. These directives instruct the compiler to perform certain code transformations in order to convert the serial code into parallel.
Although it is possible to implement OpenMP as transformation to pure pthreads code, this is seldom done. Large part of the OpenMP mechanics is usually built into a separate run-time library, which comes as part of the compiler suite. For GCC this is libgomp. It provides a set of high level functions that are used to easily implement the OpenMP constructs. It is also internal to the compiler and not intended to be used by user code, i.e. there is no header file provided.
With GCC it is possible to get a pseudocode representation of what the code looks like after the OpenMP transformation. You have to supply it the -fdump-tree-all option, which would result in the compiler spewing a large number of intermediate files for each compilation unit. The most interesting one is filename.017t.ompexp (this comes from GCC 4.7.1, the number might be different on other GCC versions, but the extension would still be .ompexp). This file contains an intermediate representation of the code after the OpenMP constructs were lowered and then expanded into their proper implementation.
Consider the following example C code, saved as fun.c:
void fun(double *data, int n)
{
#pragma omp parallel for
for (int i = 0; i < n; i++)
data[i] += data[i]*data[i];
}
The content of fun.c.017t.ompexp is:
fun (double * data, int n)
{
...
struct .omp_data_s.0 .omp_data_o.1;
...
<bb 2>:
.omp_data_o.1.data = data;
.omp_data_o.1.n = n;
__builtin_GOMP_parallel_start (fun._omp_fn.0, &.omp_data_o.1, 0);
fun._omp_fn.0 (&.omp_data_o.1);
__builtin_GOMP_parallel_end ();
data = .omp_data_o.1.data;
n = .omp_data_o.1.n;
return;
}
fun._omp_fn.0 (struct .omp_data_s.0 * .omp_data_i)
{
int n [value-expr: .omp_data_i->n];
double * data [value-expr: .omp_data_i->data];
...
<bb 3>:
i = 0;
D.1637 = .omp_data_i->n;
D.1638 = __builtin_omp_get_num_threads ();
D.1639 = __builtin_omp_get_thread_num ();
...
<bb 4>:
... this is the body of the loop ...
i = i + 1;
if (i < D.1644)
goto <bb 4>;
else
goto <bb 5>;
<bb 5>:
<bb 6>:
return;
...
}
I have omitted big portions of the output for brevity. This is not exactly C code. It is a C-like representation of the program flow. <bb N> are the so-called basic blocks - collection of statements, treated as single blocks in the program's workflow. The first thing that one sees is that the parallel region gets extracted into a separate function. This is not uncommon - most OpenMP implementations do more or less the same code transformation. One can also observe that the compiler inserts calls to libgomp functions like GOMP_parallel_start and GOMP_parallel_end, which are used to bootstrap and then to finish the execution of a parallel region (the __builtin_ prefix is removed later on). Inside fun._omp_fn.0 there is a for loop, implemented in <bb 4> (note that the loop itself is also expanded). Also all shared variables are put into a special structure that gets passed to the implementation of the parallel region. <bb 3> contains the code that computes the range of iterations over which the current thread would operate.
Well, not quite a C code, but this is probably the closest thing that one can get from GCC.
I haven't tested it with openmp. But the compiler option -E should give you the code after preprocessing.

Resources