Seemingly unnecessary bitcast -> phi -> bitcast being generated

Seemingly unnecessary bitcast -> phi -> bitcast being generated - clang

I am having a problem where Clang is seemingly generating unnecessary bitcasts (from f32 -> i32 -> f32. The following piece is the generated IR (with stripped out irrelevant pieces. I tried highlighting the relevant lines but that does not work in a code block.
The problem is the bitcast defining %5, followed by the duplicated phi nodes (%6 and %7), and a bitcast store which is the last line. I don't see a reason why this datapath is bitcasted into i32, it is not being used anywhere else.
The corresponding c code contains only float data types, some structs with nested arrays as data types, if elseif else constructs with some floating point constants (it is generated code). No clue as on why.
(ccode: https://pastebin.com/c0gYkwhF, llvm ir: https://pastebin.com/NSyhSkUa)
tldr;
Why is the bitcast to i32 being generated
Can I prevent this somehow (don't want to deal with i32 datatype)
; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void #CurrentControl_step() local_unnamed_addr #0 {
entry:
; (...)
%mul3 = fmul fast float %4, 0x3FEFFFEF80000000
%add4 = fadd fast float %3, %mul3
%cmp = fcmp fast ogt float %add4, 0x3FEF400000000000
br i1 %cmp, label %if.then, label %if.else
; (...)
if.else7: ; preds = %if.else
store float %add4, float* getelementptr inbounds (%struct.DW_CurrentControl_T, %struct.DW_CurrentControl_T* #CurrentControl_DW, i64 0, i32 1, i64 0), align 4, !tbaa !2
%5 = bitcast float %add4 to i32
br label %if.end8
if.end8: ; preds = %if.then6, %if.else7, %if.then
%6 = phi i32 [ -1082523648, %if.then6 ], [ %5, %if.else7 ], [ 1064960000, %if.then ]
%7 = phi float [ 0xBFEF400000000000, %if.then6 ], [ %add4, %if.else7 ], [ 0x3FEF400000000000, %if.then ]
; (...)
%9 = fsub fast float %7, %mul9
; (...)
store i32 %6, i32* bitcast (float* getelementptr inbounds (%struct.DW_CurrentControl_T, %struct.DW_CurrentControl_T* #CurrentControl_DW, i64 0, i32 2, i64 0) to i32*), align 4, !tbaa !2
; (...)
EDIT:
I have a new minimal example generating a seemingly unnecessary bitcast:
typedef struct {
float nestedArray[2];
} Foo;
Foo s;
float testVar;
void test(void) {
testVar = s.nestedArray[0];
}
which generates:
%struct.Foo = type { [2 x float] }
#s = dso_local local_unnamed_addr global %struct.Foo zeroinitializer, align 4
#var = dso_local local_unnamed_addr global float 0.000000e+00, align 4
; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void #test() local_unnamed_addr #0 {
entry:
%0 = load i32, i32* bitcast (%struct.Foo* #s to i32*), align 4, !tbaa !2
store i32 %0, i32* bitcast (float* #var to i32*), align 4, !tbaa !2
ret void
}

Bitcast was caused by: doc
// Try to canonicalize loads which are only ever stored to operate over
// integers instead of any other type. We only do this when the loaded type
// is sized and has a size exactly the same as its store size and the store
// size is a legal integer type.
// Do not perform canonicalization if minmax pattern is found (to avoid
// infinite loop).

Related

How can I get variable name of GEP, and use it to create a if then instruction?

I am a beginner in llvm ir, and now I am doing a llvm transform pass to add some code in C program, but I have some trouble in it.
The source C code is
void foo(int i) {
a[i] = -1;
}
and its llvm ir is：
define void #foo(i32 noundef %0) #0 {
%2 = alloca i32, align 4
store i32 %0, ptr %2, align 4
%3 = load i32, ptr %2, align 4
%4 = sext i32 %3 to i64
%5 = getelementptr inbounds [10 x i32], ptr #a, i64 0, i64 %4
store i32 -1, ptr %5, align 4
ret void
}
Now I want to get the variable i, and build an if then instruction, which looks like:
void foo(int i) {
a[i] = -1;
if (i > 5)
// do some things
}
My question is that how can I get the variable i and build an if statement in llvm ir? now I just know use llvm built-in function SplitBlockAndInsertIfThen to create an if…then… statement, but how can I get i first? Waiting for someone to answer my question, or give an idea. Sincerely!

Global or Local variable different builder behavior

I'm still doing practical with llvm-c Api,
I have a doubt about this code:
I got the original code from source.
In Delphi is:
procedure test;
(*int A[1024];
int main(){
int B[1024];
A[50] = A[49] + 5;
B[0] = B[1] + 10;
return 0;
}
*)
var
context : TLLVMContextRef ;
module : TLLVMModuleRef;
builder : TLLVMBuilderRef;
typeA,
typeB,
mainFnReturnType : TLLVMTypeRef;
arrayA,
arrayB,
mainFn,
Zero64,
temp,
temp2,
returnVal,
ptr_A_49,
ptr_B_1,
elem_A_49 ,
elem_B_1,
ptr_A_50,
ptr_B_0 : TLLVMValueRef;
entryBlock,
endBasicBlock : TLLVMBasicBlockRef;
indices : array[0..1] of TLLVMValueRef;
begin
context := LLVMGetGlobalContext;
module := LLVMModuleCreateWithNameInContext('meu_modulo.bc', context);
builder := LLVMCreateBuilderInContext(context);
//
// Declara o tipo do retorno da função main.
mainFnReturnType := LLVMInt64TypeInContext(context);
// Cria a função main.
mainFn := LLVMAddFunction(module, 'main', LLVMFunctionType(mainFnReturnType, nil, 0, False));
// Declara o bloco de entrada.
entryBlock := LLVMAppendBasicBlockInContext(context, mainFn, 'entry');
// Declara o bloco de saída.
endBasicBlock := LLVMAppendBasicBlock(mainFn, 'end');
// Adiciona o bloco de entrada.
LLVMPositionBuilderAtEnd(builder, entryBlock);
// Cria um valor zero para colocar no retorno.
Zero64 := LLVMConstInt(LLVMInt64Type(), 0, false);
// Cria o valor de retorno e inicializa com zero.
returnVal := LLVMBuildAlloca(builder, LLVMInt64Type, 'retorno');
LLVMBuildStore(builder, Zero64, returnVal);
//
// Array global de 1024 elementos.
typeA := LLVMArrayType(LLVMInt64Type, 1024);
arrayA := LLVMBuildArrayAlloca(builder, typeA, LLVMConstInt(LLVMInt64Type, 0, false), 'A'); //LLVMAddGlobal (module, typeA, 'A');
LLVMSetAlignment(arrayA, 16);
// Array local de 1024 elementos.
typeB := LLVMArrayType(LLVMInt64Type(), 1024);
arrayB := LLVMBuildArrayAlloca(builder, typeB, LLVMConstInt(LLVMInt64Type, 0, false), 'B');
LLVMSetAlignment(arrayB, 16);
// A[50] = A[49] + 5;
// Na documentação diz para usar dois indices, o primeiro em zero: http://releases.llvm.org/2.3/docs/GetElementPtr.html#extra_index
// The first index, i64 0 is required to step over the global variable %MyStruct. Since the first argument to the GEP instruction must always be a value of pointer type, the first index steps through that pointer. A value of 0 means 0 elements offset from that pointer.
indices[0] := LLVMConstInt(LLVMInt32Type, 0, false);
indices[1] := LLVMConstInt(LLVMInt32Type, 49, false);
ptr_A_49 := LLVMBuildInBoundsGEP(builder, arrayA, #indices[0], 2, 'ptr_A_49"');
TFile.WriteAllText('Func.II',LLVMDumpValueToStr(mainFn));
elem_A_49 := LLVMBuildLoad(builder, ptr_A_49, 'elem_of_A');
temp := LLVMBuildAdd(builder, elem_A_49, LLVMConstInt(LLVMInt64Type(), 5, false), 'temp');
indices[0] := LLVMConstInt(LLVMInt32Type(), 0, false);
indices[1] := LLVMConstInt(LLVMInt32Type(), 50, false);
ptr_A_50 := LLVMBuildInBoundsGEP(builder, arrayA, #indices[0], 2, 'ptr_A_50');
LLVMBuildStore(builder, temp, ptr_A_50);
//
// B[0] = B[1] + 10;
indices[0] := LLVMConstInt(LLVMInt32Type, 0, false);
indices[1] := LLVMConstInt(LLVMInt32Type, 1, false);
ptr_B_1 := LLVMBuildInBoundsGEP(builder, arrayB, #indices[0], 2, 'ptr_B_1');
elem_B_1:= LLVMBuildLoad(builder, ptr_B_1, 'elem_of_B');
temp2 := LLVMBuildAdd(builder, elem_B_1, LLVMConstInt(LLVMInt64Type(), 10, false), 'temp2');
indices[0] := LLVMConstInt(LLVMInt32Type, 0, false);
indices[1] := LLVMConstInt(LLVMInt32Type, 0, false);
ptr_B_0 := LLVMBuildInBoundsGEP(builder, arrayB, #indices[0], 2, 'ptr_B_0');
LLVMBuildStore(builder, temp2, ptr_B_0);
//
// Cria um salto para o bloco de saída.
LLVMBuildBr(builder, endBasicBlock);
// Adiciona o bloco de saída.
LLVMPositionBuilderAtEnd(builder, endBasicBlock);
// Cria o return.
LLVMBuildRet(builder, LLVMBuildLoad(builder, returnVal, ''));
// Imprime o código do módulo.
//LLVMDumpModule(module);
TFile.WriteAllText('Func.II',LLVMDumpValueToStr(mainFn));
// Escreve para um arquivo no formato bitcode.
if (LLVMWriteBitcodeToFile(module, 'meu_modulo.bc').ResultCode <> 0) then
raise Exception.Create('error writing bitcode to file, skipping');
end;
the problem is here, if arrayA is a global variable:
// Array global de 1024 elementos.
typeA: = LLVMArrayType (LLVMInt64Type, 1024);
arrayA: = LLVMAddGlobal (module, typeA, 'A');
LLVMSetAlignment (arrayA, 16);
....
....
ptr_A_49: = LLVMBuildInBoundsGEP (builder, arrayA, #indices [0], 2, 'ptr_A_49 "');
TFile.WriteAllText ( 'Func.II', LLVMDumpValueToStr (mainFn));
the gep instruction is not transferred to the code, in fact the output is
define i64 #main () {
entry:
% retorno = allocates i64
i64 0 store, i64 *% return
% B = allocates [1024 x i64], i64 0, align 16
end:; No predecessors!
}
if ArrayA is a local variable:
// Array global de 1024 elementos.
typeA: = LLVMArrayType (LLVMInt64Type, 1024);
arrayA: = LLVMBuildArrayAlloca (builder, typeA, LLVMConstInt (LLVMInt64Type, 0, false), 'A'); // LLVMAddGlobal (module, typeA, 'A');
LLVMSetAlignment (arrayA, 16);
.....
....
ptr_A_49: = LLVMBuildInBoundsGEP (builder, arrayA, #indices [0], 2, 'ptr_A_49 "');
TFile.WriteAllText ( 'Func.II', LLVMDumpValueToStr (mainFn));
the gep instruction is transferred to the code, in fact the output is:
define i64 #main () {
entry:
% retorno = allocates i64
i64 0 store, i64 *% return
% A = allocates [1024 x i64], i64 0, align 16
% B = allocates [1024 x i64], i64 0, align 16
% "ptr_A_49 22" = getelementptr inbounds [1024 x i64], [1024 x i64] *% A, i32 0, i32 49
end:; No predecessors!
}
why?

Reply from llvm-dev mainling list
With 'A' being a global variable, the GEP becomes a ConstantExpr
(GetElementPtrConstantExpr instead of GetElementPtrInst) since all of
its arguments are constant. ConstantExpr are "free-floating", i.e. not
int a BasicBlock's instruction list and therefore only appear in the
printout when used.
Michael
Other user
LLVM has roughly[1] two kinds of Value: Constants and Instructions.
Constants are things like literal constants, (addresses of) global
variables, and various expressions based just on those things; they're
designed to be values that can be directly calculated by the compiler
and/or linker without any CPU instructions actually being executed[2].
Instructions on the other hand sit inside Functions as real entities,
they produce %whatever Values and, unless optimized away, will be
turned into real CPU instructions in the end.
So, you were asking for "GEP something, 0, 49". If that "something" is
a Constant (e.g. a GlobalVariable) then that GEP only depends on
Constants so it can be a ConstantExpr too, written
"getelementptr([1024 x i64], [1024 x i64]* #var, i32 0, i32 49)". That
Constant is then not inserted into a block (it's not an instruction so
it can't be). Instead it's written directly in any instruction that
uses it, so if you actually use the GEP you might see something like:
%val = load i64, i64* getelementptr([1024 x i64], [1024 x i64]*
#var, i32 0, i32 49)
Until you use it, it's not actually in the function anywhere though.
You just have a handle when needed.
On the other hand if the "something" is a local variable, then the GEP
needs to be an actual instruction inside a function and the API you're
using will insert it automatically.
In the Constant case, you can manually create an instruction anyway,
at least in C++. I'm afraid I haven't used the C API and couldn't see
an obvious way there, but you probably don't want to since
optimization would quickly undo it and turn it back into a Constant.
Cheers.
Tim.
[1] There's also Arguments, representing function parameters. They
behave like Instructions for these purposes.
[2] But you can build pathological Constants that no linker really
could calculate like 4 * #global. That tends to result in a compiler
error.

How to generate LLVM SSA Format

I write the following C code where variable X is being assigned twice:
int main()
{
int x;
x = 10;
x = 20;
return 0;
}
Compile and generate IR representation using the following command
clang -emit-llvm -c ssa.c
IR generated
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
entry:
%retval = alloca i32, align 4
%x = alloca i32, align 4
store i32 0, i32* %retval
store i32 10, i32* %x, align 4
store i32 20, i32* %x, align 4
ret i32 0
}
If my understanding of SSA format is correct, we should in this example see x1 and x2 as two LLVM IR variables generated and assigned two values 10 and 20 respectively. Is there some specific option we should compile with to get SSA IR representation or my understanding of IR representation is incorrect? Please advise.
EDIT: as suggested in one answer, using -mem2reg optimization pass gives me the following output
clang -c -emit-llvm ssa.c -o ssa.bc
opt -mem2reg ssa.bc -o ssa.opt.bc
llvm-dis ssa.opt.bc
cat ssa.opt.ll
Resultant IR generated
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
entry:
ret i32 0
}
it looks like the entire x assignment got optimized using mem2reg optimization. Any other way to generate and retain different x values?

LLVM passes mem2reg and reg2mem convert code to/from SSA form. You can run them using opt tool.

bitcast integer to vector of char

I just compiled a small piece of C code using clang 3.7:
typedef unsigned char char4 __attribute__ ((vector_size (4)));
char4 f1 (char4 v)
{
return v / 2;
}
That functions compile to (I removed debuginfo):
define <4 x i8> #f1(<4 x i8> %v) {
entry:
%div = udiv <4 x i8> %v, bitcast (<1 x i32> <i32 2> to <4 x i8>)
ret <4 x i8> %div
}
According to llvm documentation, bitcast operation doesn’t change bits, meaning to <4 x i8> should yield <2, 0, 0, 0> (or <0, 0, 0, 2>). Am I right?
Therefore, I’ll get Division by Zero exception.
The code I wrote intended to make a broadcast (or splat), and not a bitcast.
Could someone please explain what’s happening?
Thanks!

actually it looks like a bug in clang:
https://llvm.org/bugs/show_bug.cgi?id=27085
this input code should either not compile, or generate a warning, or compile to a vector splat

Empty while loop hangs in iPhone release build

For some reason having an empty while loop in a release build hangs, while having it in the debug build works fine. This example works in debug but hangs in release:
//Wait for stream to open
while (!_isReadyForData);
This is the solution I came up with in order to get it to work in release:
//Wait for stream to open
while (!_isReadyForData)
{
//For some reason in release mode, this is needed
sleep(.5);
}
I am just curious why I would need to add something in the loop block of code.

The reason is of course due to compilers optimizations, as already noted in the comments.
Remembering that Objective-C is built on top of C, I put together a simple C example with different levels of optimizations and here's the result.
Original code
int main(int argc, char const *argv[]) {
char _isReadyForData = 0;
while (!_isReadyForData);
return 0;
}
LLVM IR with no optimizations (-O0)
define i32 #main(i32 %argc, i8** %argv) #0 {
entry:
%retval = alloca i32, align 4
%argc.addr = alloca i32, align 4
%argv.addr = alloca i8**, align 8
%_isReadyForData = alloca i8, align 1
store i32 0, i32* %retval
store i32 %argc, i32* %argc.addr, align 4
store i8** %argv, i8*** %argv.addr, align 8
store i8 0, i8* %_isReadyForData, align 1
br label %while.cond
while.cond: ; preds = %while.body, %entry
%0 = load i8* %_isReadyForData, align 1
%tobool = icmp ne i8 %0, 0
%lnot = xor i1 %tobool, true
br i1 %lnot, label %while.body, label %while.end
while.body: ; preds = %while.cond
br label %while.cond
while.end: ; preds = %while.cond
ret i32 0
}
LLVM IR with level 1 optimizations (-O1)
define i32 #main(i32 %argc, i8** nocapture %argv) #0 {
entry:
br label %while.cond
while.cond: ; preds = %while.cond, %entry
br label %while.cond
}
As you can see, the compiler produces an infinite loop when optimizing, since the local variable _isReadyForData is useless in that context and therefore is removed.
As suggested by #faffaffaff, using the volatile keyword on _isReadyForData may solve the issue.
LLVM IR with level 1 optimizations (-O1) with volatile keyword
define i32 #main(i32 %argc, i8** nocapture %argv) #0 {
entry:
%_isReadyForData = alloca i8, align 1
store volatile i8 0, i8* %_isReadyForData, align 1
br label %while.cond
while.cond: ; preds = %while.cond, %entry
%_isReadyForData.0.load1 = load volatile i8* %_isReadyForData, align 1
%lnot = icmp eq i8 %_isReadyForData.0.load1, 0
br i1 %lnot, label %while.cond, label %while.end
while.end: ; preds = %while.cond
ret i32 0
}
But I definitely agree with #rmaddy in saying that you'd better change the flow of your program and use driven logic, instead of patching what you have already.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Seemingly unnecessary bitcast -> phi -> bitcast being generated - clang

Related

How can I get variable name of GEP, and use it to create a if then instruction?

Global or Local variable different builder behavior

How to generate LLVM SSA Format

bitcast integer to vector of char

Empty while loop hangs in iPhone release build

Categories

Resources