How to get pointer to pointer in LLVM? - clang

C++ code:
int main() {
int* k = new int(0);
int* j = k;
return 0;
}
clang++ -S -emit-llvm :
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32*, align 8
%3 = alloca i32*, align 8
store i32 0, i32 *%1, align 4
%4 = call i8* #_Znwm(i64 4) #2
%5 = bitcast i8* %4 to i32*
store i32 0, i32* %5, align 4
store i32* %5, i32** %2, align 8
%6 = load i32*, i32** %2, align 8
store i32 *%6, i32** %3, align 8
ret i32 0
}
The question is about
store i32* %5, i32** %2, align 8
How is it possible to get i32** from %2(i32*) without generating additional LLVMValue like (pseudocode):
%starstar = alloca(i32**)
store(%2, %starstar)
I do not see any bitcasts or something like this either.
%2 was i32* and then it is i32** in the store instruction.
I would like to know how.
Any help is appreciated.

%2 was i32* and then it is i32** in the store instruction.
%2 was never i32*. alloca T allocates memory for a value of type T and then returns a pointer to that memory. So the type of alloca T is T*, meaning the type of %2 in your code is i32** and the type of %1 is i32*.

Related

Useless clang temporary in LLVM for `return 0` in simple C program [duplicate]

Here's a simple C file with an enum definition and a main function:
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
return 0;
}
It transpiles to the following LLVM IR:
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4
ret i32 0
}
%2 is evidently the d variable, which gets 2 assigned to it. What does %1 correspond to if zero is returned directly?
This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this
int factorial(int n){
int result;
if(n < 2)
result = 1;
else{
result = n * factorial(n-1);
}
return result;
}
You'd probably do this
int factorial(int n){
if(n < 2)
return 1;
return n * factorial(n-1);
}
Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.
Modified code,
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
if(d) return 1;
return 0;
}
IR,
define dso_local i32 #main() #0 !dbg !15 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4, !dbg !22
%3 = load i32, i32* %2, align 4, !dbg !23
%4 = icmp ne i32 %3, 0, !dbg !23
br i1 %4, label %5, label %6, !dbg !25
5: ; preds = %0
store i32 1, i32* %1, align 4, !dbg !26
br label %7, !dbg !26
6: ; preds = %0
store i32 0, i32* %1, align 4, !dbg !27
br label %7, !dbg !27
7: ; preds = %6, %5
%8 = load i32, i32* %1, align 4, !dbg !28
ret i32 %8, !dbg !28
}
Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.
Why does this matter — what's the actual problem?
I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.
In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.
Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.
The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.
That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:
main: # #main
xor eax, eax
ret

How to generate LLVM bitcode and disassembled code having similar variable names of source code

I am trying to generate LLVM bitcode and disassembled (.ll) code from a c source code. I want the instructions in the bitcode to have similar variable names as the source code.
Suppose I have a source code (sample.c):
int test(int a){
return a++;
}
The sample.ll contains :
; Function Attrs: noinline nounwind uwtable
define i32 #test(i32) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = add nsw i32 %3, 1
store i32 %4, i32* %2, align 4
ret i32 %3
}
Here, %0 resembles variable a in the source code.
How can I generate a sample.ll like this?
; Function Attrs: noinline nounwind
define i32 #test(i32 %a) #0 {
entry:
%a.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%inc = add nsw i32 %0, 1
store i32 %inc, i32* %a.addr, align 4
ret i32 %0
}
Where %a resembles variable a in the source code.
NB: The clang version I am using is 6.0.0-1ubuntu2~16.04.1
I am using the command : clang -Xclang -disable-O0-optnone -O0 -emit-llvm -c sample.c -o sample.bc and then llvm-dis sample.bc
The thing you want to name isn't an Instruction, it's an Argument. The Argument constructor takes a Name argument, which is probably the intended way to set that. I've no idea why clang doesn't do that in your case. You can also call setName() later.
Making instructions have names follows the same pattern, provided that they don't have a void type. In your example, alloca and inc both have names. Making the load have a name would usually be done by passing a NameStr argument. setName() works on Instructions too (both Instruction and Argument inherit Value).

CLang++ generating spurious vars in LLVM_IR

Please consider the following program:
int main() {
int test = 17;
return test;
}
Compile to LLVM_IR: clang++ -S -emit-llvm test.cpp
Looking at the IR, the function main is defined as so:
; Function Attrs: noinline norecurse nounwind optnone uwtable
define dso_local i32 #main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 17, i32* %2, align 4
%3 = load i32, i32* %2, align 4
ret i32 %3
}
We can see that %2 is the allocation of our test variable, loading 17 into it, and %3 uses that variable as the funcition's return value (in keep with the code as we wrote it). However, we see that %1 defines another int sized variable, and initializes it to 0, despite never using it. This extra variable is nowhere to be seen in the C++ source.
I should note that I see the same being generated when I compile using clang rather than clang++.
What is this extra variable?
I assume you are using an old version of clang. In the new version ( I mean v7.0 and later), value names are printed by default. But to be explicitly print, you might you -fno-discard-value-names. With this option you'll get the following IR:
define dso_local i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%test = alloca i32, align 4
store i32 0, i32* %retval, align 4
store i32 17, i32* %test, align 4
%0 = load i32, i32* %test, align 4
ret i32 %0
}
Now it is quiet clear where store 0 comes from. In an unoptimized code, the compiler initializes the retval to 0.

How to save the variable name when use clang to generate llvm ir?

I generate ir by use 'clang -S -emit-llvm test.c'.
int main(int argc, char **argv)
{
int* a=0;
a=(int *)malloc(sizeof(int));
printf("hello world\n");
return 0;
}
and this is the ir:
define i32 #main(i32, i8**) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i32*, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
store i32* null, i32** %6, align 8
%7 = call noalias i8* #malloc(i64 4) #3
%8 = bitcast i8* %7 to i32*
store i32* %8, i32** %6, align 8
%9 = call i32 (i8*, ...) #printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* #.str, i32 0, i32 0))
ret i32 0
}
how can I make the variable name remain unchanged,like a still %a ,not %3?
Actually dropping of variable names is a feature and needs to be activated with -discard-value-names. Clang in a release build does this by its own (a self compiled clang in debug mode not).
You can circumvent it with
clang <your-command-line> -###
Then copy the output and drop -discard-value-names.
Newer clang version (since 7) expose the flag to the normal command line:
clang -fno-discard-value-names <your-command-line>
Source
There is not such way. The variable names in LLVM IR are merely for debugging only and also there is certainly no way to preserve them when the code is converted to full SSA form.
If you need to preserve source code information consider using debug info.

Why does the clang compiler put these instructions at the start of every function which takes arguments?

I used clang to compile this code with -S -emit-llvm:
int sub2(int n) {
return n - 2
}
And this is the code it outputted:
; Function Attrs: nounwind
define i32 #_Z4sub2i(i32) #0 {
%2 = alloca i32, align 4
store i32 %0, i32* %2, align 4
%3 = load i32, i32* %2, align 4
%4 = sub nsw i32 %3, 2
ret i32 %4
}
However, I could write the same function as:
define i32 #sub2(i32) #0 {
%2 = sub i32 %0, 2
ret i32 %2
}
Why does it adds those instruction? I am not sure about it, but it seems it's copying the argument.
This is because you haven't run the mem2reg pass. The variables are considered to occupy space on the stack and are alloca'd.
If you try
opt --mem2reg filename.ll -S
you will see that you get something similar to what you expected.
mem2reg is also a part of O1, O2, and O3.
The mem2reg pass tries to convert "variables" into llvm temporaries. It does this only for those variables who address is not taken.

Resources